节点文献
视频序列中运动物体检测算法研究
Study on the Algorithms of Moving Objects Detection in the Video Sequence
【作者】 刘敏;
【导师】 张道礼;
【作者基本信息】 华中科技大学 , 微电子学与固体电子学, 2013, 博士
【摘要】 在计算机视觉研究领域里,运动物体检测作为预处理技术中非常重要的任务,就是将感兴趣的运动物体从背景中分离出来,在智能视频监控(Automatic VideoSurvalliance,AVS)、视频压缩、视频检索和智能人机接口等方面有着广泛的应用。在实时场景中,由于光照的变化、背景元素的扰动、阴影、照相机的抖动或移动等因素的影响,如何将前景从背景中准确地分离出来是一个极具挑战性的研究问题。本文重点对各种动态场景下的前景检测算法进行研究。所谓动态场景一种是指由固定照相机拍摄,但却存在背景运动的场景,如背景是喷泉、波浪或飘雪等,另外一种是指存在照相机移动的场景,或者更复杂的即存在背景运动又存在照相机运动的场景。重点选择了高斯混合模型(Mixture of Gaussian,MOG)、动态纹理(Dynamic Texture,DT)模型和基于生物视觉的Center-Surround机制的前景检测算法作为具体的研究对象。对于固定照相机拍摄的相对静态的场景,针对MOG算法计算复杂度高的问题,重点对该算法进行了优化和改进。基本思想是首先选用计算复杂度低的运行期均值(Running Average,RA)算法进行粗检测,大致定位到前景所在区域,然后在粗检测的前景区域里采用改进的MOG算法对每个像素进行细检测。为了抑制阴影的影响,选用YUV颜色格式作为像素的特征。改进的方法与传统的MOG算法和非参数核密度估计(No-Parametric Kernel Density Estimator,KDE)算法相比,在获得较优的检测性能时,明显地降低了计算复杂度,其运行速度能满足实时视频处理的需要。对于固定照相机拍摄的动态场景,重点对DT模型进行了研究分析。针对其整体建模时由于输入数据是高维向量,在学习DT参数时奇异值分解(Singular ValueDecomposition, SVD)复杂度很高的问题,对Gopalakrishnan等提出的SO(SustainObservibility)算法进行了优化和改进。基本思想是先对观测性测量方法进行了优化,然后根据线性系统的观测性的系统特性,提出可以在图像的下采样位置计算观察性大小,并采用上采样技术取得原始尺度图像中每一个像素的观测性大小。其特点是与SO算法相比在获得接近的检测性能时,计算复杂度明显地降低了。其次,针对DT局部建模时,虽然降低了单次奇异值分解的复杂度,但却增加了奇异值分解的次数,针对该问题,我们对基于局部DT建模(Local Dynamic Texture, LDT)的方法进行了改进。基本思想是利用动态冗余度来计算块之间的相似性,只对相似性小的块组用DT进行建模。改进的LDT方法与其他方法相比,具有较低的等错率(Equal Error Rate,EER),同时计算复杂度也比较低。针对DT模型的状态空间维数事先设定的问题,提出了一种由输入数据驱动的自适应设置方法。该方法根据DT参数估计时奇异值分解获得的奇异值矩阵,引入奇异熵的概念,根据奇异熵增量来自适应选择状态空间维数。DT模型经过自适应设定维数后用于动态场景中进行前景检测将具有较低的等错率,其检测性能明显优于模型维数事先假定的算法。另外,在估计DT参数时,为了尽可能避免奇异值分解操作,我们采用一种联合batch-PCA(Principal Component Analysis, PCA)和CCIPCA(CandidCovariance-Free Incremental Principal Component Analysis, CCIPCA)结合的方法,首先选用batch-PCA估计的DT参数作为基参数,再用CCIPCA对基参数进行更新。和采用batch-PCA方法相比,在获得接近的性能时,平均每帧处理时间明显减少。对于存在照相机运动的动态场景,由于照相机的运动使大多数前景检测算法将一部分背景也检测为前景,致使算法的虚警率比较高。本文对基于生物视觉的Center-Surround机制进行了深入研究,并提出一种先全局后局部检测的方法。全局检测时使用改进的SO算法以获得候选的前景区域,局部检测时采用贝叶斯Center-Surround架构在该区域里计算像素的局部特征对比度,最后将局部检测的前景轮廓信息反馈到候选前景区域里进行去伪求精。其特点是全局检测时算法的运算规模大,针对整帧图像的每一个像素,但是所采用的算法比较简单,局部检测时算法的运算规模小,只局限在候选的前景区域里,但是算法的复杂度大些,因此平均每帧处理时间大大降低了。该算法与目前大多数算法相比,具有较优的检测性能同时计算复杂度也比较低。
【Abstract】 In the computer vision field, foreground detection is to separate the moving objects ofinterest from the background as an important pre-processing task. It is now widely appliedin the automatic video survalliance (AVS), video compression, video index and automatichuman-machine interface. In the realistic scenarios, how to accurately separate theforeground from the background is very challenging because of illumination change,background elements motion, shadow or camera motion.We focus on the foregrounddetection methods in the dynamic scenes. One scenes are shot by the static camera but withthe scene motion (such as fourtain, waves, flying snow and so on). Another scenes are staticbut with the camera motion. Even some scenes are more complex, where both scenesmotion and camera motion exist. Any of these types of scenes are refered to as the dynamicscenes. We mainly study on Mixture of Gaussian (MOG), Dynamic Texture (DT) modeland some methods based on center-surround framework.In order to reduce the computational cost in the relatively static scenes shot by thestationary camera, the MOG method is improved. The improved method first detects therough fouregrund region by Running Average (RA) algorithm, where each pixel is thenprocessed by edited MOG algorithm. In order to suppress the shadow, the YUV colorinformation is used as the pixel feature. Compared to MOG and No-Parametric KernelDensity Estimator (KDE) algorithm, the improved method has the bettter performance andlower computational cost. Its processing speed can meet the realtime need.We do much research on the DT model for some dynamic scenes shot by the stationarycamera.When the video process is modeled by DT in a holistic manner, the observed datamatrix has the high dimension. So Singular Value Decomposition (SVD) of the theobserved data matrix has high computational complexity. In order to resolve this problem,the Sustain Observibility (SO) method is modified. It first improves the method of measuring the observibility and then measures the observibility at the subsample locationsaccording to the system property of observibility. The observibility value of each pixel atthe original scale can be obtained by upsample technique. The modified method has thesimilar performance as the SO method, but its computational cost is much lower. In order toreduce the dimensionality of the model, the DT model is applied to local video patches,which reduces the computational complexity of each SVD operation, but the number of DTmodel is increased. In order to deal with the problem, we propose the Local DynamicTexture (LDT) method, which computes the similarity between the video patches accordingto the dynamic redundancy. DT model is only applied to the video patches with littlesimilarity. Compared to other methods, LDT method both has the lowest average EqualError Rate (EER) values and computational cost.Most approaches based on DT model assume that the dimensionality of the state spaceis a constant for all the tested scenes. In order to deal with this problem, we propose anadaptive method which is driven by the observed data, which computes the singular entropyfrom the singular matrix. The increment of singular entropy at each order is thresholded todecide the order of model. When the dimensionality of the state space is adaptively decidedaccording to the proposed method, the lowest EER can be obtained by applying DT modelto foreground detection.Besides, in order to avoid the SVD operation, a method ofcombining the batch-PCA(Principal Component Analysis) and Candid Covariance-FreeIncremental Principal Component Analysis (CCIPCA) is adopted to reduce thecomputational cost, which the basis parameters are learned by batch-PCA, and other DTparameters are obtained by updating the basis parameters with the new observed data.Compared to batch-PCA, the method has the similar performance, but its computationalcost is much reduced.Most approaches detect some background as the foreground in the scenes with cameraor Ego motion. In order to resolve the problem, much research is done on the center-surround hypothesis from the biologic vision and we propose a method whichcombines the global and local detection. In the stage of global detection, the improved SOmethod is used to obtain the candidate foreground regions. In the stage of local detection,bayesian center-surround framework is used to compute the local feature contrast in the thecandidate foreground regions. The contour information obtained from the local detection isfeedback to confirm the accurate foreground regions and remove the background pixels inin the candidate foreground regions.The global detection is operated at each pixel of thewhole video frames with the low algorithm complexity, while the local detection is onlydone at at pixels in the candidate foreground regions with the high algorithm complexity.So the average processing time per frame is reduced greatly. Compared to other methods,the proposed method has the better performance and lower computational cost.
【Key words】 Foreground; Mixture of Gaussian; Dynamic Texture; Singular ValueDecomposition; Singular entropy; Center-surround;