节点文献

基于卡方统计的情感文本分类

Sentiment Text Classification Based on Chi-square Statistics

  • 推荐 CAJ下载
  • PDF下载
  • 不支持迅雷等下载工具,请取消加速工具后下载。

【作者】 周爱武马那那刘慧婷

【Author】 ZHOU Ai-wu;MA Na-na;LIU Hui-ting;College of Computer Science and Technology,Anhui University;

【机构】 安徽大学计算机科学与技术学院

【摘要】 通过对情感文本与n-gram特征的研究与分析,提出了一种基于卡方统计的特征词提取方法.方法中,ngram特征作为文本特征,在传统卡方统计的基础上选取共现或单独出现的特征,因为共现与单独出现的特征在不同类别中可能存在区别性.然后,根据多元特征与类别的相关性判别去除n-gram中冗余的特征,从而选取高类别相关而低冗余的n-gram特征.对上述方法利用SVM算法在不同语料中进行测试,通过实验对比分析,验证了该方法的有效性.

【Abstract】 Because of the short sentiment text length,the lack of information,and the sparseness of features.When use the n-gram approach,the redundancy and relevance between words are ignored.This paper proposes n-gram features selection method based on Chi-square statistics.Firstly,each feature is evaluated by taking into account the simultaneous or individual occurrence of features within the feature set.Based on the idea that the occurrence of one feature but not the other may also convey valuable information for discrimination.Then the redundancy between words is reduced by chi-square statistic algorithm calculate the relevance between features and categories.So that we can extract n-gram features of high categories relevance and low redundancy.Finally,using Support Vector Machine classifier to identify the text orientation in different corpus,the experimental results show that this method improves the accuracy of text classification.

【基金】 国家自然科学基金项目(61202227)
  • 【文献出处】 微电子学与计算机 ,Microelectronics & Computer , 编辑部邮箱 ,2017年08期
  • 【分类号】TP391.1
  • 【被引频次】18
  • 【下载频次】310
节点文献中: 

本文链接的文献网络图示:

本文的引文网络