VisualComputing_2

上一节讲了CV的介绍和Sparse Representation的内容,包括CV的概念、应用和难点;Sparse Representation的formulation, method以及步骤。当然还有为何Sparse Representation can work. 这一节讲一下Dictionary Learning和Representative works

Dictionary Learning

Introduction

在稀疏编码之前,需要学习一组过完备的字典,从而使得编码向量是稀疏的。以下分为两种字典,Analytical and Learn;
Analytical包括DCT bases, Wavelets, Curvelets…; Learn dictionaries from natural images: K-SVD, Coordinate descent, Online dictionary learning;

为什么需要字典学习?

  • Over-complete learned dictionary often work better than analytically
  • More adaptive to specific task/data
  • Less strict constraints on mathematical properties of bases
  • More flexible to model data
  • Tend to produce sparser solution

L0:K-SVD

对于L0稀疏的字典学习,我们可以用K-SVD方法近似求解,其中可以看成是K-MEANS的一种扩展
字典学习的问题可以Modeling为:
$$min_{D,A}||Y-DA||_F^2 \ \ s.t.\ \ ||a_i||_0 <= T_0$$ 其中i为任意正数,$T_0$为稀疏值
K-SVD
如图,对于字典学习:

  • 首先是稀疏编码,可以用Matching Pursuit来优化求解;然后用K-SVD方法更新字典。
  • 然后将DA进行K次分片叠加得到$DA=\sum_{i=1}^K d_i a_i^T$, 这里便是一个可用词典;剥离第K条,寻找新的d,x来更新该条目
  • 最后,只抽取非零的a组成新的矩阵$\Omega$作为系数矩阵,对误差能量矩阵作SVD分解,d取U的第一行,x取$\sum V^T$的乘积第一列

总结K-SVD的思想:K次分片,使得最后学得的字典over-complete; 选用第K个条目更新,每次只更新一个字典atom(one column in fat matrix); 对剥离后的‘空洞’做K-SVD, $E_k = U \sum V^T$, 新的d,a则取里面能量最大的元素, 这是对误差’空洞’的最佳逼近;只抽取非零系数组成新矩阵更新,有助于保持原来字典的稀疏性;

对于L1字典,可以对D和A交替学习:当更新D时,这是Quadratic Programming; 当更新A时,这是LASSO Optimization (ADMM);

Representative Work

Online learning: 考虑新来的样本,直接在原来基础上更新词典的策略以及收敛性

Multi-scale Dictionary learning: 由于complexity increases exponentially with signal dimension,所以一般用较小的patch size; 而multi-scale可以自适应地融合不同scale字典编码
Multi-Scale

Double Sparsity: 可以针对高层次稀疏特征或者large patch再进行一次dictionary learning, 基于稀疏编码或着高维编码的再一次稀疏表示;
Double Sparsity

Restoration Methods

Filtering-based methods: Isotropic method, Anisotropic method
Transformation methods: Motivation, find new representation where signal and noise can be better separated; Wavelet transform

K-SVD denoising

Basic Idea: 1.train over-complete dictionary 2.adopt trained dictionary to denoise patch in noisy image 3.Utilize the patch to reconstruct
Modeling
Limitations: 1.Solving sparse coding not effective enough 2.L0 is not good choice

BM3D

BM3D
BM3D denoising算是业内最为经典的去噪算法了,其中结合了Nonlocal self-similarity和sparsity两个最重要的priors,效果非常不错,速度一般

步骤:首先通过non-local matching找到一组图片块;组成tensor进行维纳滤波,之后进行阈值抑制(这里相当于稀疏去噪);最后对新的tensor结合原来的tensor再重复做维纳滤波和阈值抑制;得到去噪patches reconstruct到图像即可

优缺点:1.有效挖掘了nonlocal similarity和sparsity 2.在DWT(小波)做协同滤波并不能描述复杂的图像结构

LSSC

Group Sparsity
Group SPARSITY
与普通的L1 sparsity不同,Marial提出系数矩阵满足group sparsity的L1,2范数;使得同样的patch,在字典下应该具有统一的稀疏编码,保持元组具有相同稀疏的特性;仔细看12范数的表达形式,j是行,i是列,使得系数尽可能在同一行;

整个流程和BM3D相似,只是在协同滤波和阈值抑制上,改成用group sparsity的字典学习和稀疏编码去噪

Adaptive Sparse Domain Selection: 由于大的词典使得稀疏编码过程非常耗时,而大的词典对于描述图像局部结构又是很有必要;这个方法提出从大字典中选择一个子集可以提速

Piece-wise Linear Estimation (PLE), Motivations:

  • Sparse representation assumes Laplacian prior on coefficients, lead to nonlinear sparse coding estimator
  • Use Mixture of Gaussians to approximate Laplacian
  • Select one appropriate Gaussian Prior to reconstruct

Coupled Dictionary Learning

Motivations:

  • Used coupled dictionary to model the relationship between degraded image and its corresponding images
  • Build the corresponding in sparse domain(same code but different dictionary)
    SRSR

Semi-coupled Dictionary Learning: flexible the relationship between two dictionary, the sparse code with a pre-learned mapping

坚持分享,支持原创