节点文献

融合双路CNN-LSTM与注意力机制的语音情感识别模型

Speech emotion recognition model combining two-channel CNN-LSTM and attention mechanism

推荐 CAJ下载
PDF下载
不支持迅雷等下载工具，请取消加速工具后下载。

【Author】 SHEN Yan;LI Hongyan;MENG Zhihong;ZHANG Licai;College of Electronic Information and Optical Engineering,Taiyuan University of Technology;

【通讯作者】李鸿燕;

【机构】太原理工大学电子信息与光学工程学院；

【摘要】针对现有以卷积神经网络为基础的语音情感识别方法存在特征提取不足、模型识别效果不佳等问题，提出融合双路CNN-LSTM与注意力机制的语音情感识别模型。模型采用双路多维多尺度特征提取方法，结合残差块、多尺度卷积提取MFCC、Chroma和语谱图深层特征，增加特征多样性；采用注意力机制，分别计算双路特征的自注意力与交叉注意力参数，分配不同权重系数并进行加权融合，综合互补信息，减少特征冗余影响；采用LSTM网络提取时序特征，获取上下文语义信息，采用Softmax函数在数据集RAVDESS与SEWA上的分类准确率分别为90.19%和89.23%。更多还原

【Abstract】 Aiming at the problems in existing speech emotion recognition methods based on convolutional neural networks,such as insufficient feature extraction and poor model recognition,a speech emotion recognition model combining two-channel CNN-LSTM and attention mechanism is proposed. In order to increase the feature diversity, a two-path multi-dimensional multi-scale feature extraction method is proposed,the method combined residual blocks and multi-scale convolution to extract MFCC,Chroma and spectrogram features. By calculating self-attention parametersand cross-attention parameters of the two-path features,the attention mechanism assigned different weight coefficients of the two-path features and weighted the two-path features. LSTM network was used to extract temporal features and obtain contextual semantic information. Softmax was used to classify emotions,and the classification accuracy of RAVDESS and SEWA dataset was 90.19% and 89.23%.更多还原

【关键词】情感识别；注意力机制；长短时记忆网络；双路多维多尺度特征提取；多尺度卷积；
【Key words】 emotion recognition； attention mechanism； Long Short-Term Memory network； dual-path multi-dimension multi-scale feature extraction； multi-scale convolution；

【基金】国家自然科学基金项目（62201377）;山西省回国留学人员科研资助项目（2022-072）

【文献出处】电子设计工程 ,Electronic Design Engineering , 编辑部邮箱 ,2024年18期

【分类号】TN912.34;TP183
【下载频次】202

知网节下载

节点文献中：

本文链接的文献网络图示:

本文的引文网络

节点文献