CS231n-7

CNN

终于进入CNN的话题了，介绍一下CONV,POOL,FC层的做法，具体的结构、参数、运算量等。

回顾一下，Mini-batch SGD
Loop: 1. Sample a batch of data

Forward prop it through the graph, get loss
Backprop to calculate the gradients
Update the parameters using the gradient

卷积操作过程的参数计算公式图，趋向于用数量更多的小Filter,更深的网络。
ConV的卷积核深度总是和输入的立方Feature Map的深度一致，而Kernel的个数就是新的FeatureMap的Depth.

一般来说：Max Pool with 2*2 filters and stride 2

FC： Containes Neurons connect to the entire input volume

ConV的参数取决于Filter,例如2272273通道的Feature Map,用Stride为4的96个Kernel 1111 Filters,则有(11113)96=35K， Output Volum [((227-11)/4+1)5596]

Pool的参数为0，FC的参数最多.

2012的ALEXNET的网络架构

2014的VGGNET网络架构和参数，主要占显存的是头几层的FeatureMap,而主要占用参数是FC层，VGG是初始化效果最佳的网络之一。
TOTAL MEMORY: 24M4bytes ~= 93MB/image (Only forward!~2 for bwd)
TOTAL params: 138M parameters

之后是GoogleNET(2014),6.7% for top5 error,12X less params than ALEXNET
然后是MSRA的RESNET（2015）， 3.6% top5 error, at runtime: faster than VGGNet

特点：BN after every CONV, Xavier/2 for initialization, SGD+Momentum(0.9), Learning rate,0.1 and dived by 10 when validation error plateaus. Mini-batch size 256, Weight decay of 1e-5, No dropout