Deep Video Deblurring

今天读一下CVPR2017的spot light,用CNN做Video Debluring的。提出了encoder-style network来做V-deblur，主要提出了如何有效构建训练集，还比较了一系列方法，并对single,no-aligh,homography, flow alignment等方法在CNN网络中的影响做了实验评估

Abstract

与单帧去模糊不一样，基于视频可以利用相邻帧的有效信息，进行”sharpen”的修补。但是对齐算法一般计算代价大，而且效果有限。对于聚合的方法而言，也需要算法能够识别出哪些区域可以精确对齐，哪些区域对齐不能。而用CNN end-to-end训练可以有效适应这些问题，关键在于如何仿真出或者生成出真实的模糊视频和高清视频，作者提出了一种生成式的位移模糊方法。

Methodology

V-deblur, Idea: 从相邻帧borrowing “sharp” pixels，使当前帧有可能恢复出高质量视频帧。

Aligning methods limitation: warping-based alignment is not robust around disocclusions and areas with low texture, and often yields warping artifacts. In addition to the align-ment computation cost, methods that rely on warping have to therefore disregard information from mis-aligned content or warping artifacts, which can be hard by looking at local image patches alone.

Related Work: 1.Deblur using deconvolution 2.Multi-image aggregation 3.Data-driven approaches

具体架构参数，利用了Skip connection,架构出encoder-decoder network，将15 frames堆叠一起作为输入，输出中间帧；（或者全部15帧）

Experiment

实验分别比较了single(s-CNN), dbn-noalgin(v-CNN), dbn-homo(CNN+RANSAC), dbn-algin(CNN+OPTICAL FLOW)

实验参数：240fps Video用的patches:15128128, train 2million pairs, batchsize:64, ADAM, 45hrs

Limitation

在构建仿真模糊数据库时，只考虑了一种Motion Blur，没有考虑其他模糊类型；另外，对于没有用alignment的DBN-NO ALIGN的输出可能会模糊，需要进一步挖掘策略来使得输出更sharp,然后扩展训练集的video type.