节点文献

韵律块基频曲线的优化及规则

F0 Contour Optimization and Its Rules in Chinese

  • 推荐 CAJ下载
  • PDF下载
  • 不支持迅雷等下载工具,请取消加速工具后下载。

【作者】 刘浩杰杜利民付跃文

【Author】 Liu Hao-jie①② Du Li-min① Fu Yue-wen③ ①(Institute of Acoustics, Chinese Academy of Sciences, Beijing 100080, China) ②(Geophysical Research Institute, Shengli Oil Field, Dongying 257000, China) ③(College of Information Science and Engineering, Nanjing University of Technology, Nanjing 210009, China )

【机构】 中国科学院声学研究所南京工业大学信息科学与工程学院 北京100080胜利油田物探研究院东营257000北京100080南京210009

【摘要】 汉语规则合成系统中,连续语流基频曲线(F0曲线)的生成并不是各个合成单元F0曲线的简单拼接,而是语音学功能单元的综合作用。该文提出了汉语韵律块基频曲线优化的正演思想,使韵律块内的基频曲线融入重音强度、形状失真度以及发音速度等语境和发音的个体信息,提高合成语音的自然度。基于这种优化思想,该文针对聚类后的单音节、二音节和三音节韵律块的基频曲线,利用最小均方误差准则通过反演提取了各个单元的优化相关参数(高音线、低音线、平滑因子、形状失真度、重音强度)。对音节在韵律块中的位置因素和声调因素对优化相关参数的影响的统计分析表明了参数提取结果的可靠性和基频曲线优化的合理性,得到了优化控制参数在规则合成系统中具体的应用规则。实际的听测实验表明,韵律块基频曲线进行优化前后,合成系统的清晰度分别为3.25和3.35,自然度分别为2.9和3.31。

【Abstract】 The fundamental frequency contour (F0 contour) for utterance in rule-based speech synthesis system, is shaped by many functional unit in phonetics, not only the simple concatenation of F0 contour among the nearby syllables. In order to improve the naturalness of synthesized speech, this paper proposes a new forward idea of F0 contour optimization in Chinese prosodic chunk, which can integrate the environmental factors (such as, the stress, the distortion of syllable, the articulation velocity, etc.) into the F0 contour. And based on the idea of optimization, this paper inversely extracts the parameters associated with optimization (namely the top-line, the bottom-line, the smoothness, the distortion, the stress) from the clustered F0 contour using the MMSE principle for the monosyllable, the disyllable, the trisyllable chunks. Further, this paper analyzes the influence of position and tone to the parameters associated with optimization. The analyzed result shows the reliability of the extracted parameters and the rationality of the optimization theory on the whole, so the rules of the parameters associated with optimization can be got for the different prosodic chunk in speech synthesis system. The actual listening test shows that, the scores of intelligibility are 3.25 and 3.35 before and after the optimization, and the scores of naturalness are 2.9 and 3.31.

  • 【文献出处】 电子与信息学报 ,Journal of Electronics & Information Technology , 编辑部邮箱 ,2007年01期
  • 【分类号】TN912.3
  • 【被引频次】5
  • 【下载频次】164
节点文献中: 

本文链接的文献网络图示:

本文的引文网络