节点文献

CNN边缘影响分析与改进的语音识别

Improved speech recognition using CNN edge-effect analysis

  • 推荐 CAJ下载
  • PDF下载
  • 不支持迅雷等下载工具,请取消加速工具后下载。

【作者】 方园园朱敏

【Author】 FANG Yuanyuan;ZHU Min;Information Department,Nanjing University of Aeronautics and Astronautics;

【机构】 南京航空航天大学

【摘要】 大多数现有的关于卷积神经网络的语音识别方法注重于大规模数据的训练或网络模型的改进上,但是它们忽略了语音二维特征存在的几何问题。通过在选取数据集上对提取的梅尔倒谱系数相关特征进行实验,可以观察到多数的二维特征在卷积神经网络算法中存在边缘影响。该影响是指非零特征多集中在整个特征图的边缘区并且会造成语音特征在训练阶段中关键信息的丢失,从而大大降低语音识别的准确率。经过对该影响的进一步研究,采取了几种几何改进方法来减轻二维特征在卷积神经网络算法中的边缘影响。实验结果表明,各种改进后的二维语音特征相对于原始特征在正确率或鲁棒性上都有不同程度的优势。

【Abstract】 Most existing speech recognition approaches based on CNN focus on lots of data training or the network model improvement. However,they ignore the geometric issues in two-dimensional features of speech. By experiments on related Mel frequency cepstrum coefficients(MFCC)features extracted from recorded datasets,it can be observed that most two-dimensional features have edge-effect in the CNN algorithm. The effect is that non-zero features are mostly concentrated in the edge region of the whole feature map and result in the key information lost during the training of speech features. Then the speech recognition is significantly degraded. With the further analysis on the effect,several geometric improvement approaches are proposed to alleviate the edge-effect of 2 D features in the CNN algorithm. The experimental results demonstrate that various improved 2 D speech features outperform the original ones in terms of accuracy or robust.

【基金】 国家自然科学基金项目(61703206)
  • 【文献出处】 现代电子技术 ,Modern Electronics Technique , 编辑部邮箱 ,2021年18期
  • 【分类号】TN912.34;TP183
  • 【被引频次】2
  • 【下载频次】209
节点文献中: