节点文献

基于目标逼近特征和双向联想贮存器的情感语音基频转换

F0 Transformation for Emotional Speech Synthesis Using Target Approximation Features and Bidirectional Associative Memories

  • 推荐 CAJ下载
  • PDF下载
  • 不支持迅雷等下载工具,请取消加速工具后下载。

【作者】 高丽凌震华戴礼荣

【Author】 Ligao;Zhen-Hua Ling;Li-Rong Dai;University of Science and Technology of China;

【机构】 中国科学技术大学电子工程与信息科学系

【摘要】 本文提出了一种用于情感语音合成的基频转换方法。该方法使用定量目标逼近(q TA)特征作为语音音节层的基频描述,使用高斯双向联想贮存器(GBAM)实现中性合成语音音节层q TA参数向目标情感语音音节层q TA参数的转换。在模型训练阶段,首先基于中性语料库和统计参数语音合成方法构建中性语音合成系统;然后利用少量情感录音数据,将从情感语音文本对应的中性合成语音中提取的q TA参数作为源数据,将情感录音中提取的q TA参数作为目标数据,进行GBAM转换模型的训练。在情感语音合成阶段,利用训练得到的GABM模型,实现中性合成语音基频特征向目标情感的转换。实验结果表明,该方法在目标情感数据较少的情况下可以取得比最大似然线性回归(MLLR)模型自适应方法更好的情感表现力。

【Abstract】 In this paper, we proposed a F0 transformation method for emotional speech synthesis. We use quantitative target approximation(q TA) features to represent F0 contour in syllable level. And Gaussian Directional Associative Memories( GBAM) is used to complete the transformation for syllable-level q TA parameters from synthesized neutral speech to target emotional recordings. In the training stage, firstly we use HMM-based statistical parametric speech synthesis to construct a neutral speech synthesis system with neutral corpus as training set. And then, with a small amount of emotional recording data, GBAM-based transformation model is trained by using the q TA parameters extracted from synthesized neutral speech corresponding to the emotional text as the source feature and the q TA parameters extracted from target emotional recordings as the target patterns of GBAM, respectively. In the generation of emotional speech, we utilize the trained GBAM model to complete the transformation for syllable-level F0 features from synthesized neutral speech to target emotional recordings. The experiment resultes indicate that, in the case of little emotional recording data, our proposed method performed better than the adaptation method by using Maximum Likelihood Linear Regression( MLLR) in emotional expressivity.

【基金】 国家自然科学基金项目(61273032)
  • 【会议录名称】 第十三届全国人机语音通讯学术会议(NCMMSC2015)论文集
  • 【会议名称】第十三届全国人机语音通讯学术会议(NCMMSC2015)
  • 【会议时间】2015-10-25
  • 【会议地点】中国天津
  • 【分类号】TN912.33
  • 【主办单位】中国中文信息学会语音信息专业委员会
节点文献中: 

本文链接的文献网络图示:

本文的引文网络