节点文献

歌唱人转换研究

Research on Singing Voice Conversion

【作者】 方鹏

【导师】 汪增福;

【作者基本信息】 中国科学技术大学 , 控制科学与工程, 2016, 硕士

【摘要】 语音信号中包含有语义信息和和个人身份信息。所谓说话人语音转换就是在保持语义信息不变的情况下,改变说话人语音中包含的个人身份信息,使其听起来像是另一个指定的人在说话。说话人语音转换通常通过改变说话者的音色与音高等发音特征来实现。到目前为止,科研工作者己经在这方面做了大量的工作,形成了一些有较高成熟度的语音转换技术。然而,虽然和语音转换存在着千丝万缕的联系,歌唱人转换问题却鲜有人涉及。究其原因,主要是因为相比于语音转换,歌唱人转换专业性更强、难度更高。在上述研究背景下,本文对歌唱声转换这一课题进行了深入研究,开发了若干歌唱人转换算法,并在此基础上构建了一个歌唱人转换系统。本文的主要工作和创新点如下:1.为了实现歌唱人转换,首先请专业的音乐工作者(称为源歌唱者)录制了一个歌唱数据库。此外,为了提取希望生成的目标歌唱人的声音特征,同时也是为了评价所提出的歌唱人转换算法的优劣,还录制了目标歌唱人的歌唱数据库。录制时要求录制人员尽量按照乐谱上的音高来演唱以最大限度地抑制不同歌唱者在音高上的差异给歌唱人转换算法带来的不利影响。一共录制了时长约132分钟的中文歌声,为歌唱人转换提供了可靠的数据来源。2.传统的转换语音合成方法因为在基频提取和激励信号生成方面存在一定的问题,使得所生成的转换语音质量较差。为了解决这一问题,本文采用梅尔对数频谱近似(Mel Log Spectrum Approximation,MLSA)滤波器对源歌唱人的声音直接进行滤波的方法来获得目标歌唱人的歌唱声音。实验结果表明,所述方法可以取得比较理想的歌唱人转换效果。3.基于高斯混合模型(Gaussian Mixture Model,GMM)的转换方法是一种较好的方法,但该方法在训练数据不足时存在过拟合现象。而在实际应用中由于目标歌唱人的歌唱样本较难采集,其可用于训练的歌唱样本数往往偏少。为了解决这个问题,提出了一种结合核模糊聚类和偏最小二乘回归(Partial Least Squares Regression,PLS)的歌唱人转换方法。实验结果表明,该方法在训练数据较少时,可以取得优于GMM方法的歌唱人转换效果。

【Abstract】 Speech signal contains linguistic information and acoustic information of the speaker.Voice conversion is a technique which can modify the acoustic information of voice spoken by one source speaker to be perceived as the voice spoken by another specific speaker with the linguistic information unaltered.Generally,voice conversion is re-alized by change the timbre and pitch of soure speaker.A wide variety of work of voice conversion has been done and the conversion technology is mature to some ex-tent.Although singing voice conversion is similar to voice conversion,it is not widely researched with the reason that singing voice conversion is more professional and dif-ficult conpared to voice conversion.Under the above background,this paper go into more details about singing voice conversion.We proposed some algorithms to convert the singing voice and building a complete system of singing voice conversion.The main contribution of this dissertation is organized as follow:1.In order to achieve the conversion of singing voice,we recorded a singing voice database of a professional singer(source singer).We also recorded a database of target singer to extract his voice feature and evaluate the algorithm of singing voice conversion.Considering the pitch inconsistency of source and target singer may lead to conversion error,the singers are required to sing according to the score’s pitch.We have recorded a total of 132 minutes long singing voice in Chinese,which provide reliable database for singing voice conversion.2.In conventional voice conversion,the speech quality of converted singing voice suffers from fundamental extraction errors and excitation signal producting error.In order to improve the converted singing voice quality,we apply the mel log spectrum approximation(MLSA)filter to synthesize the converted singing voice by filter the source singing waveform.According to the experiment,the new method can obtain better converted singing voice.3.Although the Gaussian mixture model(GMM)based conversion method is ex-cellent,it is easy to be overfitting when the dataset is small.It is difficult to get enough suitable singing voice in the application of singing voice convertion,so we propose to adopt the kernel fuzzy clustering and partial least squares(PLS)re-gression based singing voice conversion method which can get better conversion result compared to the GMM based method when the training data is inadequate.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络