节点文献
听觉特性与鲁棒语音识别算法研究
Research on Auditory Characteristics and Robust Speech Recognition Algorithms
【作者】 孙暐;
【导师】 吴镇扬;
【作者基本信息】 东南大学 , 信号与信息处理, 2006, 博士
【摘要】 语音识别技术了开创人机交互的新纪元,它被广泛应用于工业、军事、商业、金融、服务、医疗、日常生活等很多领域。在实际应用中,由于环境不匹配,导致识别系统的性能急剧恶化。因而,语音识别的环境鲁棒性成为目前鲁棒语音识别研究的热点和难点。论文先后研究了目前语音识别以及鲁棒语音识别技术、人耳的听觉特性、语音识别中采用的估计准则、噪声环境对语音识别影响的途径,并根据人耳的感知特性、不同频带信号及噪声的功率谱差异,以及噪声对不同频带识别模型影响的不同,先后采用不同的模型框架、估计准则、匹配方法、可信度信息分析,提出多种鲁棒语音识别算法,改进了现有的鲁棒语音识别算法。模型分析与补偿技术是目前噪声环境下鲁棒语音识别研究的重要途径。在大量的理论分析与研究的基础上,论文研究了基于Fletcher-Allen规则的子带框架下的语音识别算法。提出了并行子带HMM最大后验概率自适应非线性类估计算法和非线性最大后验统计匹配鲁棒语音识别算法。在并行子带HMM最大后验概率自适应非线性类估计算法中提出了MAP估计、环境映射以及BP网络联合做非线性映射分类的方法。该算法在信号信息可信度上采用了信噪比分析,算法中还提出了有效反映噪声环境的先验信息估计方法。在非线性最大后验统计匹配算法中,提出了依信噪比进行MAP统计匹配并联合非线性映射进行分类的算法。实验表明,研究获得了不同程度识别性能的改进。基于听觉特性流组合的研究,论文研究了基于噪声污染假定的多带鲁棒语音识别算法。论文研究了多带异步处理模式下的鲁棒语音识别算法。首先,论文提出了多带最大似然鲁棒语音识别算法。该算法提出基于多带模式进行最大似然估计以及线性判别分析或联合信噪比及模型近似度判决分析的鲁棒语音识别方法。论文根据多带分析的特点,还提出了判决多带最大后验多变换算法以及它的简化算法(平均估计和JamesStein估计)。该算法采用多带处理、判别分析、MAP估计,以及信息多变换等,从多种角度提取识别信息获得了非常好的性能。论文还提出了联合信噪比和模型近似度判别分析的思想和具体处理方法,并对提出的几种可靠信息合并准则进行了比较实验。研究表明,鲁棒语音识别应该是基于可信信息抽取的,也就是说不同频带信号的处理模式应该是同步、异步交织进行。为此,论文在前面多带异步处理的基础上,提出了多带同步鲁棒语音识别算法,可以看到同步信息的利用可以大大简化模型。然后综合同步、异步问题的研究,提出了同步-异步语音识别模型,并联合对语音信号随机删除下语音识别性能的分析,研究了时变-频变噪声环境下的基于信噪比可信度判决的迭代识别结构。论文中大量的理论分析和仿真实验比较表明,根据听觉感知中表现的频率特性、信号及
【Abstract】 The technology of speech recognition inaugurates a new era of the communication between human and machine. The speech recognition systems can be applied in a widely field, such as industry, military, business, finance, service, medical treatment, daily life, etc. For the environmental mismatch, the performance of the recognition systems is dramatically deteriorated. So the robustness becomes the focus of the research of speech recognition. Here, the current technology of speech recognition and robust speech recognition, the auditory characteristic of human, the estimation principles applied in speech recognition, the approaches by which noise affects the recognition performance are researched in detail. According to the auditory characteristics, the spectral difference between speech signals and noise, the different effects on speech recognition models caused by noise in different bands, several kinds of the robust speech recognition algorithms have been presented to improve the performance of the speech recognition systems in noisy environment with the different model schemes, the different principles of estimation, the different match methods, the different analysis of the reliability of the information.The technology of model analysis and compensation is an important way of robust speech recognition. According to amount of the theoretical analyses and researches, based on Fletcher-Allen principle, the nonlinear class estimation algorithm using parallel sub-band HMM maximum a posteriori probability adaptation and the nonlinear sub-band maximum a posteriori statistical matching algorithm have been proposed. The nonlinear class estimation algorithm adopts MAP principle, linear mapping, BP network to recognize the speech signal. The reliability analysis of the information utilizes the signal to noise rate in the algorithm. And a new prior information estimation method is presented, which efficiently reflects the noisy environment. The nonlinear statistical match algorithm is a new algorithm which combines MAP statistical match with nonlinear mapping. The experiments show these researches evidently improve the performance of speech recognition in noisy environment.Based on the phenomena of the auditory stream grouping, the multi-band robust speech recognition algorithms according to noise corruption assumption have been researched here. The multi-band asynchronous mode is researched firstly. The multi-band maximum likelihood robust speech recognition algorithm is based on the multi-band asynchronous mode, which utilizes the maximum likelihood linear mapping, linear analysis or discriminative analysis. According to the specialty of the multi-band analysis, the discriminative multi-band maximum a posteriori
【Key words】 Speech recognition; Auditory analysis; Hidden Markov model; Estimation principle; Synchronous analysis; Asynchronous analysis; Environment mapping; Discrimintive function;