节点文献
基于上下文语义的朴素贝叶斯文本分类算法
Context Semantic-based Naive Bayesian Algorithm for Text Classification
【摘要】 朴素贝叶斯分类器基于样本各属性相互条件独立的假设前提,它作为一种简单的词袋模型,忽略了上下文语境下同义词对分类的影响。本文提出相似词概念,使用相似词词簇代替传统的特征词典参与训练。首先训练word2vec得到词向量。然后,将特征词典用词向量表示后层次聚类,构建相似词词簇,并对其扩展。实验结果表明,改进后算法有效提高了文本分类的准确度,避免了因分类训练语料的差异导致分类效果的不稳定。
【Abstract】 The Naive Bayes classifier is based on the assumption that the samples’ attributes are independent one another. As a simple bag-of-words model,it ignores the influence of synonyms in context to classification. This paper proposes the concept of similar word and uses clusters of similar words instead of keyword dictionary in training. First,word2vec is trained to get word embedding. Second,the keyword dictionary is represented by word embedding which is then clustered hierarchically,the clusters of similiar words are built and expanded. The experimental results show that the above method can improve the accuracy of text classification,and avoid the instability of classification effect due to the differences in training corpus.
- 【文献出处】 计算机与现代化 ,Computer and Modernization , 编辑部邮箱 ,2018年06期
- 【分类号】TP391.1
- 【被引频次】14
- 【下载频次】401