今天读一下CVPR2017的spot light,用CNN做Video Debluring的。提出了encoder-style network来做V-deblur,主要提出了如何有效构建训练集,还比较了一系列方法,并对single,no-aligh,homography, flow alignment等方法在CNN网络中的影响做了实验评估
Abstract
与单帧去模糊不一样,基于视频可以利用相邻帧的有效信息,进行”sharpen”的修补。但是对齐算法一般计算代价大,而且效果有限。对于聚合的方法而言,也需要算法能够识别出哪些区域可以精确对齐,哪些区域对齐不能。而用CNN end-to-end训练可以有效适应这些问题,关键在于如何仿真出或者生成出真实的模糊视频和高清视频,作者提出了一种生成式的位移模糊方法。
Methodology
V-deblur, Idea: 从相邻帧borrowing “sharp” pixels,使当前帧有可能恢复出高质量视频帧。
Aligning methods limitation: warping-based alignment is not robust around disocclusions and areas with low texture, and often yields warping artifacts. In addition to the align-ment computation cost, methods that rely on warping have to therefore disregard information from mis-aligned content or warping artifacts, which can be hard by looking at local image patches alone.
Related Work: 1.Deblur using deconvolution 2.Multi-image aggregation 3.Data-driven approaches
具体架构参数,利用了Skip connection,架构出encoder-decoder network,将15 frames堆叠一起作为输入,输出中间帧;(或者全部15帧)
Experiment
实验分别比较了single(s-CNN), dbn-noalgin(v-CNN), dbn-homo(CNN+RANSAC), dbn-algin(CNN+OPTICAL FLOW)
实验参数:240fps Video用的patches:15128128, train 2million pairs, batchsize:64, ADAM, 45hrs
Limitation
在构建仿真模糊数据库时,只考虑了一种Motion Blur,没有考虑其他模糊类型;另外,对于没有用alignment的DBN-NO ALIGN的输出可能会模糊,需要进一步挖掘策略来使得输出更sharp,然后扩展训练集的video type.