节点文献

基于朴素贝叶斯与潜在狄利克雷分布相结合的情感分析

Sentiment analysis research based on combination of naive Bayes and latent Dirichlet allocation

  • 推荐 CAJ下载
  • PDF下载
  • 不支持迅雷等下载工具,请取消加速工具后下载。

【作者】 苏莹张勇胡珀涂新辉

【Author】 SU Ying;ZHANG Yong;HU Po;TU Xinhui;College of Information Science and Engineering,Wuchang Shouyi University;School of Computer,Central China Normal University;

【机构】 武昌首义学院信息科学与工程学院华中师范大学计算机学院

【摘要】 针对情感分析需要大量人工标注语料的难点,提出了一种面向无指导情感分析的层次性生成模型。该模型将朴素贝叶斯(NB)模型和潜在狄利克雷分布(LDA)相结合,仅仅需要合适的情感词典,不需要篇章级别和句子级别的标注信息即可同时对网络评论的篇章级别和句子级别的情感倾向进行分析。该模型假设每个句子而不是每个单词拥有一个潜在的情感变量;然后,该情感变量再以朴素贝叶斯的方式生成一系列独立的特征。在该模型中,朴素贝叶斯假设的引入使得该模型可以结合自然语言处理(NLP)相关的技术,例如依存分析、句法分析等,用以提高无指导情感分析的性能。在两个情感语料数据集上的实验结果显示,该模型能够自动推导出篇章级别和句子级别的情感极性,该模型的正确率显著优于其他无指导的方法,甚至接近部分半指导或有指导的研究方法。

【Abstract】 Generally the manually labeled corpus is a critical resource for sentiment analysis. To circumvent laborious annotation efforts,an unsupervised hierarchical generation model for sentiment analysis was presented,which was based on the combination of Naive Bayes( NB) and Latent Dirichlet Allocation( LDA),named Naive Bayes and Latent Dirichlet Allocation(NB-LDA). Just needing the right emotional dictionary,the emotional tendencies of network comments were analyzed at sentence level and document level simultaneously without sentence level and document level markup information. In particular,the proposed model assumed that each sentence instead of each word had a latent sentiment label,and then the sentiment label generated a series of features for the sentence independently by the NB manner. The proposed model could combine the advanced Natural Language Processing( NLP) correlation technologies such as dependency parsing and syntactic parsing by the introduction of NB assumption and could be used to improve the performance for unsupervised sentiment analysis. The experimental results conducted on two sentiment corpus datasets show that the proposed NB-LDA can automatically derive the emotional polarities of sentence level and document level,and significantly improve the accuracy of sentiment analysis compared to the other unsupervised methods. Moreover,as an unsupervised model,the NB-LDA can achieve comparable performance to some supervised or semi-supervised methods.

【基金】 国家社会科学基金重大项目(12&2D223);国家自然科学基金资助项目(61402191,61300144,61572223);国家语委科研项目(WT125-44);华中师范大学自主科研项目(CCNU14A05014,CCNU14A05015)~~
  • 【文献出处】 计算机应用 ,Journal of Computer Applications , 编辑部邮箱 ,2016年06期
  • 【分类号】TP391.1
  • 【被引频次】41
  • 【下载频次】680
节点文献中: 

本文链接的文献网络图示:

本文的引文网络