节点文献
基于LSTM网络提高泛化能力的研究
Research on Improving Generalization Ability Based on LSTM Network
【作者】 陈曦;
【导师】 姜黎;
【作者基本信息】 湘潭大学 , 集成电路工程, 2021, 硕士
【摘要】 近年来神经网络技术发展迅猛,逐渐应用在一些智能产品中。在很多时候,神经网络需要实现真正的落地,才能发挥它的价值。然而现实中的使用场景复杂多变,这对神经网络来说是不小的挑战。为了满足这样的需求,神经网络的泛化能力必须足够强,才能够适应真实的应用场景。由于提高神经网络的泛化能力具有现实意义,现在已经成为许多研究者所关注的问题。本文从模型和数据角度出发,展开对神经网络泛化能力的研究。为了能够提高神经网络的泛化能力,本文提出了基于多头注意力的方法以及多域度数据增强的方法。本文主要研究内容如下:(1)提出了一种基于多头注意力提高神经网络泛化的方法。该方法首先根据多头注意力机制从多个LSTM网络结构中挑选出与输入任务联系较大的LSTM,然后采用Mask矩阵根据注意力得分进行选择性激活。激活的LSTM可以读取其他LSTM的信息,完成信息交流。在此过程中保留了与任务相关的信息,提取到任务中普遍性的特征,神经网络具有更强的泛化性能。在对比传统并行LSTM的实验中,该方法在4种数据集的平均测试误差比传统方法低约1.39%。另外实验对比了相关研究,该方法在4种数据集的平均测试误差比次优的算法低约0.21%,并且在加噪情况下的平均测试误差也比次优算法低约0.73%。理论分析和实验表明,该方法能够有效地提高神经网络泛化能力。(2)在语音识别相关的数据增强方法中,大多采用常规数据增强的方法。常规方法通常是从时域上进行数据增强,但是语音数据的频域研究更为重要。于是,本文提出一种多频度数据增强的方法。首先对数据集采用4种时域数据增强的方法,然后进行3种频谱加噪及频谱掩盖方法。该方法改变了数据的结构分布,有益于神经网络学习到一般性的特征,从而能够提高神经网络的泛化能力。实验证明该方法的泛化误差比其他方法的平均值低约2.76%,显著地提高了神经网络的泛化能力。
【Abstract】 In recent years,neural network technology has developed rapidly and is gradually applied in some smart products.In many cases,the neural network needs to be truly implemented in order to play its value.However,the actual use scenarios are complex and changeable,which is no small challenge for neural networks.In order to meet such needs,the generalization ability of neural networks must be strong enough to be able to adapt to real application scenarios.Because of the practical significance of improving the generalization ability of neural networks,it has now become a concern of many researchers.From the perspective of model and data,this paper launches the research on the generalization ability of neural network.In order to improve the generalization ability of neural networks,this paper proposes a method based on multi-head attention and a method of multi-domain data augmentation.The main research contents of this paper are as follows:(1)A method to improve the generalization of neural networks based on multihead attention is proposed.This method first selects the LSTM that is more related to the input task from multiple LSTM network structures according to the multi-head attention mechanism,and then uses the Mask matrix to selectively activate it according to the attention score.The activated LSTM can read the information of other LSTMs and complete information exchange.In this process,the information related to the task is retained,and the universal features in the task are extracted,and the neural network has stronger generalization performance.In an experiment comparing traditional parallel LSTM,the average test error of this method on the four data sets is about 1.39 %lower than that of the traditional method.In addition,the experiment compared related studies.The average test error of this method in the four data sets is about 0.21 % lower than that of the sub-optimal algorithm,and the average test error under the condition of noise is also about 0.73 % lower than that of the sub-optimal algorithm.Theoretical analysis and experiments show that this method can effectively improve the generalization ability of neural networks.(2)Among the data augmentation methods related to speech recognition,most of the conventional data augmentation methods are used.Conventional methods usually carry out data augmentation from the time domain,but the frequency domain research of voice data is more important.Therefore,this paper proposes a method of multifrequency data augmentation.First,four methods of time-domain data augmentation are used for the data set,and then three methods of spectrum noise and spectrum masking are performed.This method changes the structure and distribution of the data,which is beneficial for the neural network to learn general characteristics,thereby improving the generalization ability of the neural network.Experiments show that the generalization error of this method is about 2.76 % lower than the average value of other methods,which significantly improves the generalization ability of the neural network.
【Key words】 Neural network; Generalization; Multi-head attention; Data augmentation;