老板的CV课程,在期末前做一下相关笔记总结
Introduction
What is vision?
- Perceive an integration of image data and prior knowledge in brain
- A field acquiring, processing, analyzing and understanding visual data
Computer Vision & Human Vision?
- ill-posed problems
- mathematical models
- discrete vs. continuous
- local vs. global optimization
What kinds of Topics?
Low Level: Image Denoising, Deblurring, Super-Resolution, photo-sketch synthesis, texture synthesis, optical flow, image matching
Middle Level: image segmentation, motion capture, visual tracking, 3D reconstruction
High Level: object detection, image understanding, video understanding
Related Problems: medical imaging, optical character recognition (OCR), face detection, smile detection, vision-based biometrics, shape capture, automatic driving
Why image restoration challenging?
- Real noise much more complex than additive white Gaussian
- Blur is non-uniform and complex to accurately estimate
- Space of image local structures is huge, inverse problem highly ill-posed
Sparse Representation and Dictionary Learning on Restoration
Linear system $Ax=b$, if A full rank, $x = A^{-1}b$; if tall matrix(over-determined) than approximate solution by $minimize||Ax-b||_2^2$; if fat matrix(underdetermined), no solution in general and some constraint should be imposed
假设estimation与observer的最小距离是L0,L1,L2或其他:L0,非凸优化; L1,tightest convex relaxation of L0, 稀疏解; L2有闭合Dense解。具体到一个优化问题的等高线逼近时,各个NORM BALL的图形如下。
尽管L1能逼近L0,但有时候L1也会出现非稀疏解,数学上已经证明,满足RIP性质的话,用L1近似L0能确保得到稀疏解。RIP又称有限等距性质,直观解释为从A矩阵中的部分列向量与任意向量x的乘积结果收敛在一个环形邻域,如下图
图像复原问题
Modeling
$y=Hx+v$, H:observation matrix, v:noise
Keys to solve ill-posed problems:
- Modeling the degradation process
- Good Prior knowledge about the clean image
- Good objective function for minimization
其中H在denoise是identity matrix; deblurring为blurring matrix; supperresolution是compound matrix of blurring and downsampling matrix; Inpainting是indication matrix of damaged pixels;
Methodology
Filter based methods: Gaussian low-pass, PDE-based anisotropic diffusion, Bilateral filtering, Nonlocal means filtering; (local->non local performance improve greatly)
Transform based methods: Fourier(‘Global, Orthogonal’), Wavelet(‘local, small’), Ridgelet(‘more redundant’), Dictionary Learning(‘over-complete’)
- Represent x over dictionary D, enforcing the new vector be sparse(robust)
- objective model $min_\alpha ||HD\alpha-y||_2^2+\lambda ||\alpha||_1$
The basic procedure
- Partition degraded image into overlapped patches(8*8)
- For each patch, solve the nonlinear L1-norm sparse coding problem:
$\hat{\alpha} = argmin_\alpha ||HD\alpha-y||_2^2+\lambda||\alpha||_1$ - Reconstruct each patch by $\hat{x}=D\hat{\alpha}$
- put the reconstructed patch back and average the overlapped pixels
- In practice, the 1~4 can be iterated for several rounds
why sparse?
- Neuronscience
- Bayersian
- Compressive Sensing
How to solve?
L0: Greddy search(Matching pursuit, Orthogonal matching pursuit)
- MP: 贪婪地选取相关性最大的atoms
- OMP: 正交地,把曾选的atoms的信息均用上,组合出新的投影向量
L1: - Linear programming
- Iteratively reweighted least squares:Trickly weighted L2 to L1
- Proximal gradient descent: Soft-Thresholding with analytic solution
- Augmented Lagrangian methods(Alternating Direction Method of Multipliers, ADMM)
ADMM
拉格朗日变换
将Constraint结合拉格朗日乘子放在Objective function里面。
e.g $min\ f(x) \ s.t. Ax=b$
拉格朗日形式:$L(x,\lambda)=f(x)+\lambda (Ax-b)$
对于含有不等式约束的情况,结合KKT条件,$h(x)=0, \lambda>=0, \lambda*g(x)=0, g(x)<=0 $, 其中h是等式约束,g是不等式约束,$\lambda$是不等式乘子
KKT条件:对于问题 $L(x,\lambda) = f(x)+\lambda g(x)$
满足
- $\Delta_x h(x,\lambda)=0 $
- $\lambda>=0$
- $\lambda *g(x)=0$
- $g(x)<=0$
- $\Delta_{xx} L(x,\lambda)$ is PSD
对偶问题
对于原问题$L(x,\lambda)=f(x)+\lambda(Ax-b)$
对偶形式为:$g(\lambda)= inf_x(L(x,\lambda)) = -f^*(-A^T\lambda)-b^T\lambda)$,其中inf为确认下界(infimum)
对偶问题: $max\ g(\lambda)$
对偶上升法: $x^{k+1} = argmin_x L(x,\lambda^k)$
变量更新:$\lambda^{k+1}=\lambda^k +\alpha^k(Ax^{k+1}-b)$
对偶分解法:将目标函数分解成多个子函数 $f(x)=\Sigma_{i=1}^Nf_i(x_i)$
增广拉格朗日,为了增加Dual Ascent的鲁棒性,加入松弛函数
$$L_p(x,\lambda)=f(x)+\lambda^T(Ax-b)+(\rho/2)||Ax-b||_2^2$$
ADMM
ADMM旨在将对偶上升可分解性和乘子法上界收敛属性融合在一起的算法;
优化问题:$$ min\ f(x)+g(z) \ \ s.t. \ Ax+Bz=c $$
得到增广拉格朗日形式:$ L_\rho(x,z,\lambda)=f(x)+g(z)+y^T(Ax+Bz-c)+(\rho/2)||Ax+Bz-c||_2^2 $
迭代方式:
- $ x^{k+1} = argmin_x L_p (x,z^k,\lambda^k) $
- $ z^{k+1} = argmin_z L_p (x^{k+1},z,\lambda^k) $
- $\lambda^{k+1} = \lambda^k + \rho(Ax^{k+1}+Bz^{k+1}-c) $
$\rho >0$,停止准则:对偶残差小于某个极小值$\epsilon$
收敛速度:对于一个高的精度要求收敛多次,但可以融合其他算法快速产生高精度
对于凸优化问题,KKT条件是对偶问题有相同解的保证。非凸的问题会存在Dual Gap.