CS231n-7

CNN

终于进入CNN的话题了,介绍一下CONV,POOL,FC层的做法,具体的结构、参数、运算量等。

回顾一下,Mini-batch SGD
Loop: 1. Sample a batch of data

  1. Forward prop it through the graph, get loss
  2. Backprop to calculate the gradients
  3. Update the parameters using the gradient


卷积操作过程的参数计算公式图,趋向于用数量更多的小Filter,更深的网络。
ConV的卷积核深度总是和输入的立方Feature Map的深度一致,而Kernel的个数就是新的FeatureMap的Depth.

一般来说:Max Pool with 2*2 filters and stride 2

FC: Containes Neurons connect to the entire input volume

ConV的参数取决于Filter,例如2272273通道的Feature Map,用Stride为4的96个Kernel 1111 Filters,则有(11113)96=35K, Output Volum [((227-11)/4+1)5596]

Pool的参数为0,FC的参数最多.


2012的ALEXNET的网络架构


2014的VGGNET网络架构和参数,主要占显存的是头几层的FeatureMap,而主要占用参数是FC层,VGG是初始化效果最佳的网络之一。
TOTAL MEMORY: 24M4bytes ~= 93MB/image (Only forward!~2 for bwd)
TOTAL params: 138M parameters

之后是GoogleNET(2014),6.7% for top5 error,12X less params than ALEXNET
然后是MSRA的RESNET(2015), 3.6% top5 error, at runtime: faster than VGGNet


特点:BN after every CONV, Xavier/2 for initialization, SGD+Momentum(0.9), Learning rate,0.1 and dived by 10 when validation error plateaus. Mini-batch size 256, Weight decay of 1e-5, No dropout

坚持分享,支持原创