节点文献
融合双路CNN-LSTM与注意力机制的语音情感识别模型
Speech emotion recognition model combining two-channel CNN-LSTM and attention mechanism
【摘要】 针对现有以卷积神经网络为基础的语音情感识别方法存在特征提取不足、模型识别效果不佳等问题,提出融合双路CNN-LSTM与注意力机制的语音情感识别模型。模型采用双路多维多尺度特征提取方法,结合残差块、多尺度卷积提取MFCC、Chroma和语谱图深层特征,增加特征多样性;采用注意力机制,分别计算双路特征的自注意力与交叉注意力参数,分配不同权重系数并进行加权融合,综合互补信息,减少特征冗余影响;采用LSTM网络提取时序特征,获取上下文语义信息,采用Softmax函数在数据集RAVDESS与SEWA上的分类准确率分别为90.19%和89.23%。
【Abstract】 Aiming at the problems in existing speech emotion recognition methods based on convolutional neural networks,such as insufficient feature extraction and poor model recognition,a speech emotion recognition model combining two-channel CNN-LSTM and attention mechanism is proposed. In order to increase the feature diversity, a two-path multi-dimensional multi-scale feature extraction method is proposed,the method combined residual blocks and multi-scale convolution to extract MFCC,Chroma and spectrogram features. By calculating self-attention parametersand cross-attention parameters of the two-path features,the attention mechanism assigned different weight coefficients of the two-path features and weighted the two-path features. LSTM network was used to extract temporal features and obtain contextual semantic information. Softmax was used to classify emotions,and the classification accuracy of RAVDESS and SEWA dataset was 90.19% and 89.23%.
【Key words】 emotion recognition; attention mechanism; Long Short-Term Memory network; dual-path multi-dimension multi-scale feature extraction; multi-scale convolution;
- 【文献出处】 电子设计工程 ,Electronic Design Engineering , 编辑部邮箱 ,2024年18期
- 【分类号】TN912.34;TP183
- 【下载频次】202