CNN
终于进入CNN的话题了,介绍一下CONV,POOL,FC层的做法,具体的结构、参数、运算量等。
回顾一下,Mini-batch SGD
Loop: 1. Sample a batch of data
- Forward prop it through the graph, get loss
- Backprop to calculate the gradients
- Update the parameters using the gradient
卷积操作过程的参数计算公式图,趋向于用数量更多的小Filter,更深的网络。
ConV的卷积核深度总是和输入的立方Feature Map的深度一致,而Kernel的个数就是新的FeatureMap的Depth.
一般来说:Max Pool with 2*2 filters and stride 2
FC: Containes Neurons connect to the entire input volume
ConV的参数取决于Filter,例如2272273通道的Feature Map,用Stride为4的96个Kernel 1111 Filters,则有(11113)96=35K, Output Volum [((227-11)/4+1)5596]
Pool的参数为0,FC的参数最多.
2012的ALEXNET的网络架构
2014的VGGNET网络架构和参数,主要占显存的是头几层的FeatureMap,而主要占用参数是FC层,VGG是初始化效果最佳的网络之一。
TOTAL MEMORY: 24M4bytes ~= 93MB/image (Only forward!~2 for bwd)
TOTAL params: 138M parameters
之后是GoogleNET(2014),6.7% for top5 error,12X less params than ALEXNET
然后是MSRA的RESNET(2015), 3.6% top5 error, at runtime: faster than VGGNet
特点:BN after every CONV, Xavier/2 for initialization, SGD+Momentum(0.9), Learning rate,0.1 and dived by 10 when validation error plateaus. Mini-batch size 256, Weight decay of 1e-5, No dropout