CS231n_6

发表于 2016-06-09 | 分类于 Read |

开始新一课之前，先来把上一节的相关阅读材料的知识补充上来。

Lecture 5 Notes2&3

Regularization

L1,L2的Loss function 还有Max Norm constraints: 对于系数向量w,有$w^2 < c$ ，C一般为3或4.
Dropout的技术：一般采用P=0.5,每个神经元的激活概率为0.5，然后每个样本对应一个新的Mask之后的子网络进行训练，最后测试的时候开启全部神经元但得到的结果需要乘上P=0.5这个系数。这种技术直观的好处是：1.迫使网络学习冗余的表达能力 2.实现了大型的学习模型，具有共享参数特性

Practice: 使用single,global L2 Regularization(cross-validated)+ Dropout(p=0.5)

Loss Functions

针对classification的任务，使用Softmax或者SVM Loss. 针对类别多的情况，可以使用Hierarchical Softmax.

Attribute Classification可以用Logistic regression classifier with two classes(0,1)

Regression任务，一般使用L2或者L1。

When faced with a regression task, first consider if it is absolutely necessary. Instead, have a strong preference to discretizing your outputs to bins and perform classification over them whenever possible.

Gradient Checks

Use centered formula,求梯度用左右方向的平均。$$\frac{df(x)}{dx} = \frac{f(x+h)-f(x-h)}{2h}$$

Use relative error for the comparison, relative error
$$ \frac{ |f_a - f_n | }{max(f_a,f_n)} $$ . $f_a$ 为analytic gradient , $f_n$ 为numberic gradient

In practice:

relative error > 1e-2 usually means the gradient is probably wrong
1e-2>relative error>1e-4 should make you feel uncomfortable
1e-4>relative error is usually okay for objectives with kinks, but if there are no kinks(such tanh nonlinearities and softmax) , then 1e-4 is too high.
1e-7 and less you should be happy
if too small like or than 1e-10, absolute value is worrying
越深的网络，relative errors越大，如果10层，则1e-2是可以接受的。

Summary: Careful with step size h, Gradcheck important, Don’t let regularization overwhelm the data, turn off the dropout/augmentations when gradient check, check only few dimensions.

Lecture 6

How to do parameter Updates

1.SGD: $x+= -learning rate *dx$
2.Momentum: allow velocity build up, velocity damped in steep due to changing sign

v = mu v - learning_rate dx
x += v

Nestreov Momentum:
$v_t = \mu v_{t-1} - \epsilon \Delta f(\theta_{t-1} + \mu V_{t-1})$
$\theta_t = \theta_{t-1} + V_t$

4.Adagrad: Equalization the steep and shallow direction
cache + = dx *2
x += - learn_rate dx / (sqrt(cache)+1e-7)

5.Adam: Great enough with bias correct
m = beta1 m + (1-beta1)dx
v = beta2 v + (1-beta2)(dx *2)
x += -learn_rate m/(sqrt(v)+1e-7)
一般beta1和beta2可以设置为0.9和0.995

Second Order Optimization

1.泰勒展开
2.Newton Gradient: Jacobian H is too large with O(n^3),n is million
3.Quasi-Newton O(n^2)

L-BFGS(Limited Memory BFGS):work well in full patch,but can not transfoer well to mini-batch search.

Practice: 1.Train Multiple Indeoendent model
2.At test time average the results

Fun tricks: get small boost from average multiple initilization model, keep track running average parametre vector.

Annealing learning rate

step dcay: learning rate by half every t epochs
Exponential decay $\alpha = \alpha_0 e^{-kt}$
1/t decay: $\alpha = \alpha_0 /(1+kt)$

Hyperparameter optimzation

stage search from coarse to fine
bayesian hyperparameter optimization
Model ensemble, improve the performance of NN a few percent:

Same Model, Different Initializations
Top models discovered during cross-validation
Different Checkpoints of a single model: Training is very expensive, taking the different checkpoints of single network and using those to form an ensemble.(Cheap and practice，选取一些好的epoch模型)
Running averatge of parameters during training(用训练过程模型中的均值，直观来看在碗状徘徊，均值更有利于接近底部)

Summary

1.针对少量的样本，gradient check很重要，并注意正确的初始化
2.the magnitude of updates should be ~1e-3 in first-layer
3.推荐用SGD+Nesterov Momentum or Adam
4.Decay learning rate over the period of training
5.Search good hyperparameters with random search(not grid), stage coarse to fine
6.Form model ensembles for extra performance

LDI-NAT《Color Demosaicking by Local Directional Interpolation and Nonlocal Adaptive Thresholding》

发表于 2016-06-07 | 分类于 Tech |

IDEA

这是张老师11年的文章，效果也是很好的，至今依然不断拿来对比实验。Demosaick的做法还是初始化，然后建立CDM噪声模型，然后根据样本统计拟合出符合样本的实际值。（这个建模思路很耐用，其实还是为了更好的挖掘空域和频域信息的相关性）

这里用Local Directional Interpolation(LDI)的方法进行复杂的梯度插值初始化，然后比较Non-local Minimize(NLM)的方法和Non-local Adaptive Thresholding（NAT）的方法。这里NLM是指扫描附近满足相似度的Patch，对这些patch进行distance weighted的使用；而NAT基于找到Non-local Similar Patch之后，用这些作为观测样本，在噪声模型里结合SVD、PCA的方法（假定满足Sparse），来进行分解，由于噪声v与x较少相关性，通过设定Adaptive Threshold来提取v的主成分，从而保证拟合的值更准确。

阅读全文 »

论文阅读《Residual Interpolation for color image demosaicking》

发表于 2016-06-06 | 分类于 Tech |

首先将东京工业大学新提出的基于Residual difference interpolation的几篇文章扫一遍，这是第一篇，提出residual difference rather than color difference
网站

idea

左边为正常的Demosaicking流程是：先fine-grained地对G通道插值，然后对R通道插值的时候，是根据R-G的谱间具有相似差异性的原理进行，得到delta,在插值过程中得到R图再把delta加上的过程。
现在RI提出直接使用G图做差值插值由于变换太过剧烈的地方会造成Artifact,如今用一些处理使得这个G变成G尖，这个G尖的图像满足平滑和准确的特性。如此利用这个谱间相关性的时候误差会更小，出来的效果更好。

阅读全文 »

CS231n-4 & 5

发表于 2016-06-06 | 分类于 Read |

这节课主要讲述一下BP怎么做的

BackProbagation

这里根据Chain Rule:$\frac{df}{dx} = \frac{df}{dq} \frac{dq}{dx}$
所以推算$\frac{df}{df} =1$, $\frac{df}{dz} = \frac{df}{df} \frac{df}{dz} = 1 q = 3$, 同理得后面的梯度（求导中函数的x为当前neuron的值）
Add gate: Gradient Distributor
Max gate: Gradient Router(只有max的值获得梯度传递)
Mul gate: Gradient “Switcher”(Neuron交换了梯度)
BP Gradient = [Last Gradient] [Local Gradient]

阅读全文 »

CS231n_3

发表于 2016-06-04 | 分类于 Read |

一.Loss Function

定义Multiclass SVM loss: $L_i = \sum_{j\neq y_i} max(0,s_j-s_{y_i}+1)$
这里，$L_i$为针对类别i的Loss值，$s_j$是除了i的其他类别得分,$y_i$为当前目标类别，$s_{y_i}$ 为当前目标类别得分

Full Training Loss为取平均,$L=\frac{1}{N}\sum_{i=1}^N L_i$ ,则L=（2.9+0+10.9）/3= 4.6

阅读全文 »

CS231n_1 & 2

发表于 2016-06-03 | 分类于 Read |

回来积极投身CNN的学习和研究中，受到博后哥哥宪标的推荐，毅然决然去学习standford CS231n关于CNN的公开课CNN for Visual Recognition，主要由飞飞姐和Karpathy、Johnson主讲，
Youtube视频
 课程主页

阅读全文 »

书单

发表于 2016-06-01 | 分类于 Tech |

这里记录一下书单，包括在读的、想读的、他人推荐读的。后面可以写一下正在进展的项目和工作，最后记录已读的。

在读：

《百年孤独》、《

想读：

面试微软之前的十本书

Code: The Hidden Language of Computer Hardware and Software （《编码的奥秘》）
Computer System: A Programmer’s Perspective （《深入理解计算机系统》） / Windows via C/C++ （《Windows核心编程》 / 《程序员的自我修养》
Code Complete 2（《代码大全》）/ The Pragmatic Programmer （《程序员修炼之道》，我也把这本书称为《代码小全》）
Programming Pearls （《编程珠玑》） / Algorithms / Algorithm Design / 《编程之美》
The C Programming Language
The C++ Programming Language / Programming: Principles and Practice Using C++ / Accelerated C++
The Structure and Interpretation of Computer Programs （《计算机程序的构造和解释》）
Clean Code / Implementation Patterns
Design Patterns （《设计模式》） / Agile Software Development, Principles, Patterns, and Practices
Refactoring （《重构》）

云风推荐

C++编程思想
Effective C++
深度探索C++对象模型
C++语言的设计和演化
C专家编程
C陷阱与缺陷
C语言接口与实现
Lua程序设计
Linkers and Loaders
COM本质论
Windows核心编程
深入解析Windows操作系统
程序员修炼之道
代码大全
UNIX编程艺术
设计模式
代码优化：有效使用内存
深入理解计算机系统
深入理解LINUX内核
TCP/IP 详解

冯大辉

软件随想录
黑客与画家
重来
UNIX编程艺术
编程人生

豆瓣CTO

Code Complete 2
The Mythical Man-Month （《人月神话》）
Code: The Hidden Language of Computer Hardware and Software （《编码的奥秘》）
TAOCP （不解释）
The Pragmatic Programmer （《程序员修炼之道》）
Design Patterns （《设计模式》）
The Structure and Interpretation of Computer Programs （《计算机程序的构造和解释》）
Refactoring （《重构》）
The C Programming Language
Introduction to Algorithms （《算法导论》）

已读：

技术

《统计学习方法》、《这就是搜索引擎》、《浪潮之巅》、《数学之美》、《当下的力量》、《机器学习》、《数字图像处理》、《高等数学（上）》、

非技术

《苏菲的世界》、《如何阅读一本书》、《黑客与画家》、《解忧杂货铺》、《当我在跑步，我在谈论什么》、《白夜行》、《一个人的朝圣》、《人性的弱点》、《从0到1》、《35岁前要做的33件事》、《站在两个世界的边缘》、《极简欧洲史》、《未来在现实的第几层》、《再穷也要去旅游》、《活着》、《小王子》、《自控力》、《围城》、《此生未完成》、《黄金时代》、《拆掉思维里的墙》、《facebook效应》、《别为小事抓狂》、《月亮与六便士》、《追风筝的人》、

潮汕之旅&拼图合照

发表于 2016-05-30 | 分类于 Life |

Seminar Report

理大上学期的十个Seminar Report的1500 words的总结
Seminar Report
买设备的单据
电脑单据

潮汕

在深圳工作了一会儿了，周末借着万圣节的假期和珊爷出去玩，由于港客特别多放假，珠海长隆和周边的温泉都爆满了，所以决定去潮汕享受一下美食。结果却累得不行，时间不够，交通紧张，没有领略到什么旅游的特色，倒是和珊宝确实走了更多的路了。

去了两天，住在状元街周边的旅店，总体感觉消费也不低，但人确实有点凶，不大适合旅游，受到小明的建议，决定到此体验一下潮汕的文化和美食。毕竟大学堆里有不少潮汕的朋友，这有利于了解他们的生活方式。

阅读全文 »

重回理大

发表于 2016-05-29 | 分类于 Life |

在深圳最后一周

从今年的3月中旬去到了相机部门里面去实习，认识了很多牛人，迄今为止一次在公司里比较真切完整的体验，几乎足足有三个月。（不想说有多苦，宝宝心里苦，宝宝认真上班）总结来说并没有进步很多，但确实学到不少，企业的管理模式，内部沟通机制，安全机制等，我住在新安的宿舍，每天上班需要40分钟，过着早上10点上班，中午12点半点外卖，午休到2点，晚饭6点吃完，7点休息聊天看书，8点回去散漫工作的日子。虽然紧张，但规律有节奏的生活带来的好处却是有目共睹的。很庆幸认识了共同入职的新伙伴，陪我度

阅读全文 »

Demosaic Comparison

发表于 2016-05-16 | 分类于 Tech |

Demosaic Comparison

进入公司2个月多了，对于ISP和Demosaic才刚起步，赶紧把CNN掌握好，做出点工作来吧！

总体来说， nnr和sht直接插值，锯齿感太强，Lu和NAT的方法效率不行，然后感觉AP和SA的PSNR虚高，实际效果不行。可以考虑测一下CPSNR和CIE Lab或者锯齿等其它指标。
下面比较一下AP[1], SA,LCC1和DLMMSE[2], RI[3]的结果。

[1] SA, http://www.csee.wvu.edu/~xinl/demo/demosaic.html
[2] DLMMSE, Zhang L, Wu X. Color demosaicking via directional linear minimum mean square-error estimation[J]. Image Processing, IEEE Transactions on, 2005, 14(12): 2167-2178.
[3] RI, http://www.ok.ctrl.titech.ac.jp/res/DM/RI.html

阅读全文 »

CsrjTan

blog csrjtan tanrunj

RSS