VisualComputing_3

这一节讲解如何用Dictionary learning做Classification Task

Sparse representation Classificaton

Problem Modeling:
$label(y) = argmin_k(r_k)$
$where\ \ r_k = ||y-X_k \hat{\alpha_k}||_2$

prons:

novel use sparse coding for classification
widely studied, improved and extended
good performance

cons:

SRC is owed to use of sparse coding which is not accurate
new type of classifier although the sparsity is helpful
不是有效的局部结构性特征
针对遮挡问题，字典过大

通过局部特征（Gabor,SIFT)来解决局部特征，用robust coding可以解决遮挡问题的字典过大。
LASSO和L1-LASSO最大的区别是数据保真项$e=y-X\alpha$分别服从i.i.d. Gaussian or Laplacian distribution

LASSO: $ min_{\alpha} ||y-X\alpha||_2^2 \ \ \ s.t.\ ||\alpha||_1<=\sigma$

L1-LASSO: $min_{\alpha}||y-X\alpha||_1 \ \ \ s.t.\ ||\alpha||_1<=\sigma$

MLE

最大似然估计提供了一种给定观察数据来评估模型参数的方法，“模型已定，参数未知”。一个重要的假设：所有的采样都是独立同分布的。
假设$x_1,x_2,…,x_n$为独立同分布采样，$\theta$为模型参数，$f$为模型，则产生上述采样可表示为 $$f(x_1,x_2,…,x_n|\theta)= f(x_1|\theta)*f(x_2|\theta)…,f(x_n|\theta)$$

最大似然对数: $ \hat{\theta}_{mle} = argmax_{\theta} \ell(\theta|x_1,…,x_n), \ell=\frac{1}{n} lnL$

最大似然估计的步骤：

写出似然函数
对似然函数取对数，并整理
求导数
解似然方程

MAP

最大后验估计是根据经验数据对难以观察的量的点估计(Point Estimation)，与MLE类似；不同的是，MLE融入了估计量的先验分布在其中，MAP可以看做规则化的MLE。
回顾x为采样，$\theta$为模型参数，f为模型，则MLE可以表示为：$$\hat{\theta}_{MLE}(x) = argmax_{\theta} f(x|\theta)$$

对于MAP，现在假设$\theta$的先验分布为g,通过贝叶斯理论，对于$\theta$的后验分布如下：$$\theta \mapsto f(\theta|x) = \frac{f(x|\theta)g(\theta)}{\int_{\theta} f(x|\theta^{*})g(\theta^{*})d\theta^{*}}$$

则MAP的目标为：$$\hat{\theta}_{MAP}(x)=argmax_{\theta} f(\theta|x) = argmax_{\theta} f(x|\theta)g(\theta)$$

可以看出，MAP和MLE最大的区别是MAP加入了模型参数本身的概率分布，或者说MLE的模型参数概率为均匀固定值。

Collaborative nature of SRC

对于正则项，L1为sparse,L2为Collaborative

佳哥的CVPR16文章A Probabilistic Collaborative Representation based Approach
for Pattern Classification，主要解释为什么SRC/CRC WORK,具有怎样的特性，结合了proCRC的Modeling，构建出这一分类器比传统分类器要较优；寻找一个common point for joint projection;分类问题相当于在分布空间上的映射。