节点文献
一种基于三维模型和照片的合成“说话头”
A Talking Head Synthesis System Based on 3D-Model and Photo
【摘要】 视觉语音的研究已经成为人机交互技术中一个非常活跃的领域 ,在语音的相关视觉信息当中 ,最主要的是说话人的口型乃至整个头部的图像 ,即“说话头”(talkinghead)。为了合成具有真实感的三维“说话头”模型 ,提出了一种基于三维模型和真人照片来合成真实“说话头”的方法 ,即在一个中性的三维人头部模型的基础上 ,从任意人的正面和侧面两张照片当中 ,通过提取脸形和五官位置等特征参数来校正模型 ,并且从照片中提取皮肤和头发等纹理 ,使得合成的模型能在较大程度上贴近真人。该方法综合了基于三维模型和基于图像库的建模方法 ,因此同时具有两者的优点 ,即既能够灵活控制表情和口型 ,又可自由旋转 ,不仅可实时合成 ,而且合成效果接近真人 ,自然度高。已将此模型应用于视觉语音合成系统 ,并获得了满意的效果
【Abstract】 Recently, research on Visual Speech attracts more and more attention. It has become a very active research field of the Human-Machine Interface. The chief information relative to speech is lip motion, face, and even the whole head, which is called “Talking Head”. To synthesis a lifelike three-dimension (3D) talking head model, a novel method is proposed in this paper, which is based on an individual independent 3D-model and photos of human face. At first, the features of face shape and the position of facial organs are extracted from a front-face and a side-face photo to revise the 3D-model and make it adapt the real person. Then, the textures of the skin and hair are picked from the photos and pasted on the revised 3D-model to make it looks like the person. This method integrates the techniques of 3D-model based modeling and photo lib based modeling, and has both of their advantages: the model has strong flexibility of synthesizing lip motions and expressions, can be rotated freely, can be synthesized in real-time, and can achieve a highly natural, lifelike 3D talking head visual effect. Then, the model is applied in a visual Text-to-Speech (TTS) talking head synthesis system, and gets a satisfying result.
【Key words】 talking head; visual text-to-speech; 3D model; face animation;
- 【文献出处】 中国图象图形学报 ,Journal of Image and Graphics , 编辑部邮箱 ,2004年07期
- 【分类号】TP391.41
- 【被引频次】5
- 【下载频次】174