节点文献

基于机器学习的语音驱动人脸动画方法

A Speech Driven Face Animation System Based on Machine Learning

  • 推荐 CAJ下载
  • PDF下载
  • 不支持迅雷等下载工具,请取消加速工具后下载。

【作者】 陈益强高文王兆其姜大龙

【Author】 CHEN Yi-Qiang 1+, GAO Wen 1,2, WANG Zhao-Qi 1, JIANG Da-Long 1 1(Institute of Computing Technology, The Chinese Academy of Sciences, Beijing 100080, China) 2(Department of Computer Science and Engineering, Harbin Institute of Technology, Harbin 150001, China) + Corresponding author: Phn: 86-10-82649008, Fax: 86-10-82649298, E-mail: yqchen@ict.ac.cn http://www.jdl.ac.cn Chen YQ, Gao W, Wang ZQ, Jiang DL. A speech driven face animation system based on machine learning. Journal of Software, 2003,14(2):215~221.

【机构】 中国科学院计算技术研究所中国科学院计算技术研究所 北京100080北京100080哈尔滨工业大学计算机科学与工程系黑龙江哈尔滨150001北京100080

【摘要】 语音与唇动面部表情的同步是人脸动画的难点之一.综合利用聚类和机器学习的方法学习语音信号和唇动面部表情之间的同步关系,并应用于基于MEPG-4标准的语音驱动人脸动画系统中.在大规模音视频同步数据库的基础上,利用无监督聚类发现了能有效表征人脸运动的基本模式,采用神经网络学习训练,实现了从含韵律的语音特征到人脸运动基本模式的直接映射,不仅回避了语音识别鲁棒性不高的缺陷,同时学习的结果还可以直接驱动人脸网格.最后给出对语音驱动人脸动画系统定量和定性的两种分析评价方法.实验结果表明,基于机器学习的语音驱动人脸动画不仅能有效地解决语音视频同步的难题,增强动画的真实感和逼真性,同时基于MPEG-4的学习结果独立于人脸模型,还可用来驱动各种不同的人脸模型,包括真实视频、2D卡通人物以及3维虚拟人脸.

【Abstract】 Lip synchronization is the key issue in speech driven face animation system. In this paper, some clustering and machine learning methods are combined together to estimate face animation parameters from audio sequences and then apply the learning results to MPEG-4 based speech driven face animation system. Based on a large recorded audio-visual database, an unsupervised cluster algorithm is proposed to obtain basic face animation parameter patterns that can describe face motion characteristic. An Artificial Neural Network (ANN) is trained to map the cepstral coefficients of an individual抯 natural speech to face animation parameter patterns directly. It avoids the potential limitation of speech recognition. And the output can be used to drive the articulation of the synthetic face straightforward. Two approaches for evaluation test are also proposed: quantitative evaluation and qualitative evaluation. The performance of this system shows that the proposed learning algorithm is suitable, which greatly improves the realism of face animation during speech. And this MPEG-4 based learning are suitable for driving many different kinds of animation ranging from video-realistic image wraps to 3D Cartoon characters.

【关键词】 机器学习人脸动画语音驱动
【Key words】 machine learningfacial animationspeech driven
【基金】 国家自然科学基金;国家高技术研究发展计划~~
  • 【文献出处】 软件学报 ,Journal of Software , 编辑部邮箱 ,2003年02期
  • 【分类号】TP391.41
  • 【被引频次】55
  • 【下载频次】891
节点文献中: 

本文链接的文献网络图示:

本文的引文网络