节点文献

可伸缩视频编码关键技术研究

Research of Key Problems on Scalable Video Coding

【作者】 向友君

【导师】 谢胜利;

【作者基本信息】 华南理工大学 , 信号与信息处理, 2009, 博士

【摘要】 近年来,由于Internet固有的网络异构性、用户终端设备的显示能力和处理能力存在差异,给多媒体信息的广泛传播和应用提出了巨大的挑战。视频编码的目标从面向存储转到了面向传输,编码的目的从产生适合存储的固定尺寸的码流发展到产生适合一定的传输码率的可伸缩性码流。可伸缩视频编码(Scalable Video Coding,SVC)技术(通常又称为可扩展或可分级视频编解码技术)因其能较好地解决这些问题而得到广泛的关注,其核心思想已被目前几种主流的国际视频标准接纳。本文在对基于小波变换的可伸缩视频编码框架和基于传统的混合编码结构的可伸缩视频编码框架进行分析的基础上,重点研究了两类可伸缩视频编码方案都涉及的几个关键技术:运动补偿时域滤波、运动估计以及差错掩盖。具体来讲,本论文的主要研究工作如下:首先,介绍了视频编码标准的发展状况,对可伸缩视频编码技术进行了详细说明,并给出了各项技术的编解码架构框图。其次,详细研究了运动补偿时域滤波技术及基于运动补偿时域滤波的可伸缩视频编码残差图像的特性。运动补偿时域滤波技术的研究主要包括3个方面:一是为了更好地利用相邻帧之间的相关信息,研究了不同长度滤波器系数的小波基实现MCTF的提升过程,以达到更好的滤波效果;二是根据视频序列的运动特征,研究运动补偿时域滤波的帧组结构,使之能够根据视频序列的运动剧烈程度自适应确定帧组大小;三是为增强时间滤波的灵活性,充分利用人眼视觉系统特性,根据具体情况优化小波提升中的滤波操作。在改进运动补偿时域滤波技术的基础上,本文实现了一种完全可伸缩视频编码系统,改进后的系统不仅能将时间、空间、质量这3个方面的伸缩性有机地结合起来,实现了完全可伸缩性能,还显著提高了编码效率和视频序列的重建质量。论文对运动补偿时域滤波残差图像能量的非平稳特性、时空相关特性和频率特性进行了大量实验,对实验数据进行了较详细地分析,并讨论了不同特性视频序列在编码中应该注意的问题。再次,改变运动估计模式,降低运动补偿复杂度,研究高效率的运动估计算法和有效的评价准则。利用序列图像的相邻块运动矢量的高度相关性和运动矢量的中心偏移特性,提出一种基于运动方向预测的快速运动估计算法。算法根据参考运动矢量预测出图像块的运动情况,然后根据不同的运动方向选择对应的方向性模板进行搜索。匹配的准确性是运动估计的核心问题。论文在对传统匹配准则分析的基础上,提出了一种利用图像差值分布情况作为匹配准则的方法,即图像差值均方差匹配准则,获得了比较高的编解码质量。最后,针对传输误码导致视频图像块和整帧丢失的情况,研究了视频图像后处理技术中的差错掩盖,并提出了以下三种差错掩盖算法。第一种是基于帧间信息的视频传输丢失差错掩盖算法。该算法根据丢失块周围已正确接收的图像块的运动矢量信息把丢失块分成低活动块和高活动块两类。对于低活动块,用平均运动矢量法来恢复丢失块;对于高活动块,根据前后帧图像具有空间结构相似性,利用凸集投影的原理来恢复丢的图像块。第二种是空-时边界匹配差错掩盖算法,算法中引入一种同时利用视频图像空域和时域平滑特性的边界匹配失真函数,通过最小化这个失真函数来找到每个丢失宏块的运动矢量,进而恢复丢失的图像块。第三种是基于人类视觉系统特性的偏微分方程差错掩盖方法,在上一种方法的基础上,将重建图像的优化问题看成一个各向异性扩散的平滑图像问题,该方法直接提高了重构宏块与周围像素的匹配程度。

【Abstract】 Recently, coming with the emergence of video applications under the network environment, the heterogeneous issue of Internet and the different display capabilities and computational power of terminal devices make video coding confront with the challenge in the wide dissemination and application of multimedia information. Therefore, the research target of video coding has been shifted from a pure compression viewpoint to designing a coding system that allows efficient transmission. The compressed stream should be capable of accommodating a variety of applications with diverse constraints in network bandwidth or receiver complexity. Scalable Video Coding (SVC) has come to widely attention because of the ability to solve these problems, and its core idea has been accepted by several international video standards.On the basis of the analysis of the wavelet-based scalable video coding framework and the scalable video coding framework based on the traditional hybrid coding structure, the work of this dissertation is concentrated on several key technologies involved in these two scalable video coding schemes: motion compensated temporal filtering (MCTF), motion estimation (ME) and error concealment (EC). Specifically, the main contributions of this dissertation are concluded as follows.Firstly, the development of video coding standard is presented, scalable video coding technology is described in detail, and the structure diagrams of encoding and decoding technology are given.Secondly, MCTF and the property of residual image based on MCTF are studied. The study of MCTF includes the following three aspects: 1) In order to achieve better filtering effect, the lifting process of MCTF with different length wavelet filter coefficients which make better use of relevant information between adjacent frames is studied; 2) According to the motion characteristics of the video sequence, an adaptive group of picture (GOP) structure is proposed; 3) To enhance time filtering flexibility, content adaptive update steps based on the property of the human vision system is presented. Based on these, an improved full-scalable video coding system is offered. In this system, the coded bit-stream is organized to achieve the brilliant combination of three main scalabilities: temporal, spatial and PSNR scalabilities. Experimental results show that the coding efficiency and the quality of reconstructed sequence are improved significantly. The study of the property of residual image based on MCTF in scalable video coding includes the energy non-stationary, temporal-spatial correlation and frequency property. The experiment results are given and analyzed. Thirdly, the high effective motion estimation algorithms and the effective evaluation criteria are studied through changing the mode of motion estimation and reducing the complexity of motion compensation. A new fast motion estimation algorithm based on moving direction prediction is presented, in which the high correlation of adjacently blocks’motion vectors and the center-biased characteristic of motion vectors in image sequences are used. The algorithm designs four kinds of patterns and then selects patterns to process the image according to its motion direction predicted by referenced motion vectors. High precision and high efficiency of the match and compensation can reduce prediction error and improve video compression effect, so the accuracy of block matching is the core issue. In this dissertation, a method of using the distribution of the image difference as a matching criterion is proposed, that is image difference variance matching criterion. Experimental results show that image difference variance matching criterion received relatively high codec quality.Finally, for the circumstance of losing image information caused by transmission error, three error concealment algorithms are proposed. The first one is an error concealment based on inter-frame information for video transmission. The missing blocks are classified into low activity blocks and high activity blocks by using the motion vector information of the surrounding correctly received blocks. The low activity blocks are concealed by the simple average motion vector (AVMV) method. For the high activity blocks, several closed convex sets are defined, and the method of projections onto convex sets (POCS) is used to recover the missing blocks by combining frequency and spatial domain information. The second one is an efficient spatio-temporal boundary matching algorithm (ESTBMA) which exploits both spatial and temporal information to reconstruct the lost motion vectors (MV) and also introduces a new side smoothness measurement. The motion vector corresponding to the minimum of the distortion function is used as the estimation of motion vector of the lost block. The third one is an error concealment algorithm based on the diffusion equation. A motion vectors estimated by ESTBMA is used to make a initial recovery for the lost information. Then, an anisotropy diffusion equation constructed by activity masking characteristics in human vision system (HVS) is used to make a refining recovery of lost information. The proposed algorithm can reduce the blocking artifacts of the recovered images, protect the structure information in the images, and obtain better visual effects.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络