节点文献

基于深度学习的中文反讽识别及其情感判别研究

Research on Chinese Irony Identification and Sentiment Discrimination Based on Deep Learning

【作者】 卢欣

【导师】 王素格;

【作者基本信息】 山西大学 , 计算机软件与理论, 2019, 硕士

【摘要】 目前,情感分析是自然语言处理中最活跃的研究领域之一。反讽是一种隐式情感表达的修辞手段,通过使用与实际意图相反的词来达到讽刺或者幽默的语言表达效果。反讽的真实语义无法通过文本词汇直接推断出来,它的字面意思和真实意图存在着矛盾冲突,因此,反讽识别及其情感判别更具挑战性。以往的文本情感分析往往忽略了这一语言现象,影响了情感分析的准确率。为了提升文本情感分析的准确率,本文对中文反讽识别及其情感判别开展研究。通过分析中文特有的语言现象和微博的特点,归纳了中文微博反讽的语言特征,提出了融合语言特征的卷积神经网络模型和融合上文信息的注意力机制的LSTM模型来进行反讽识别及其情感判别。主要研究工作如下:(1)中文微博反讽的语言特征选择。由于反讽与语言习惯有关,不同语言的语法结构和语义表达存在差异,和英文反讽相比,中文反讽的语法结构和语义表达更加复杂,使得中文反讽识别及其情感判别在词语层面上比英文反讽识别及其情感判别更具有难度,英文的反讽特征并不能够直接用于中文的反讽识别及其情感判别中。本文在借鉴中英文反讽识别相关工作的基础上,考虑中文微博自身的特点,归纳了中文微博反讽的几种语言特征,并通过卡方统计量选取了多种语言特征对应的特征词。(2)融合语言特征的卷积神经网络模型。传统的机器学习方法依赖于人工选取特征,这些特征的选取需要专业领域知识和大量的实践,而且单凭人工特征难以获得句子的深层语义信息。本文针对传统特征选择方法无法挖掘句子深层语义的不足,采用Skip-gram模型训练微博词向量,提出了一种融合语言特征的卷积神经网络模型。该模型在利用中文反讽语言特征的同时,融合了句子的深层语义信息。实验结果显示,该模型在中文反讽识别方面比传统的机器学习方法有明显的提升,F值达到了0.8187,同样该模型在反讽情感判别方面较单独的CNN模型有一定的改善。(3)融合上文信息注意力机制的LSTM模型。对于微博中的反讽句,它的上文信息往往叙述了反讽的原因,表达了微博的整体情感。因此,上文信息对于反讽识别及其情感判别起着关键性的作用。由于传统的CNN模型仅从连续的N-gram向量矩阵中获取局部的特征,无法解决句子中非连续性依赖和交互性问题,相互独立的节点无法有效表示序列化的文本。因此,为了更好的对句子进行语义表示,本文在融合语言特征的卷积神经网络模型的基础上,加入了LSTM和注意力机制。实验结果显示,该方法提高了中文反讽识别的精确率,并对反讽的情感判别也有一定的提升。

【Abstract】 At present,sentiment analysis is one of the most active research fields in natural language processing.Irony is a rhetorical means of implicit sentiment expression,which achieves ironic or humorous effect by using words that are contrary to the actual intention.The true semantics of irony cannot be inferred directly from the text vocabulary,there is a contradiction between its literal meaning and the real intention.Therefore,irony identification and its sentiment analysis are more challenging.Previous sentiment analysis tasks often ignore this linguistic phenomenon,which affects the accuracy of sentiment analysis.In order to improve the accuracy of sentiment analysis,this paper studies the Chinese irony recognition and its sentiments analysis.By analyzing the language phenomenon of Chinese Weibo and the characteristics of social network,this paper sums up the language features of Chinese irony.This paper puts forward the Convolutional Neural Network model with linguistic features and the LSTM model with attention mechanism integrating the above information.The main tasks are as follows:(1)The choice of language features of Chinese irony.Because irony is related to language habits,there are differences in language structure between different languages.Compared with English irony expression,the structure and grammar of Chinese irony expression are more complicated,so that Chinese irony recognition and its sentiments analysis is more difficult than English irony recognition and its sentiments analysis at the word level.The characteristics of English irony cannot be used directly in the recognition of Chinese irony and its sentiments analysis.On the basis of drawing on the related work of Chinese and English irony recognition,we take into account the characteristics of China Weibo itself.Then we summarize several linguistic features of Chinese Weibo irony,and select the characteristic words corresponding to these language features through the Chi-square statistics.(2)The Convolutional Neural Network model with linguistic features.Traditional machine learning methods rely on the characteristics of manual selection.The selection of these features requires professional domain knowledge and a large number of practices and it is difficult to obtain the deep semantic information of sentences by artificial features alone.Aiming at the shortcoming that the traditional feature selection method cannot excavate the deep semantics of sentences,we trained word vectors by the Skip-gram model,and proposed a convolutional neural network model with linguistic features.This model combines the deep semantic information of the sentence and linguistic features.The experimental results show that the model has a significant improvement over the traditional machine learning method,and the F value reaches 0.8187.Similarly,this model is better than the CNN model in the aspect of irony sentiment analysis.(3)The LSTM model with attention mechanism integrating the above information.For the irony in Weibo,the above information often describes the reasons for irony and the overall emotion of Weibo.Therefore,the above information plays a key role in the Chinese Weibo irony recognition and its sentiments analysis.Because the traditional CNN model only obtains the local feature from the continuous n-gram vector matrix,it cannot solve the problem of non-continuity dependence and interactivity of the sentence.And the independent nodes cannot effectively represent the serialized text.Therefore,in order to better represent the semantic representation of sentences,this paper adds LSTM and attention mechanism based on the convolutional neural network model with linguistic features.The experimental results show that the method improves the accuracy of irony recognition and improves the accuracy of irony sentiment analysis.

  • 【网络出版投稿人】 山西大学
  • 【网络出版年期】2020年 01期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络