节点文献
基于权值调整的文本分类改进方法
Improved text classification methods based on weighted adjustments
【摘要】 文本分类是文本挖掘的基础与核心 ,可广泛应用于传统的情报检索和 Web信息的检索与挖掘等。提出了一种利用权值调整思想对向量空间法 (VSM)和朴素 Bayes分类器 (NBC)进行改进的文本分类方法 ,并探讨了利用 EM算法进行无导师 Bayes分类的方法 ,设计和实现了一个中英文文本分类系统 CZW。 3组实验数据表明 ,用某些评估函数调节单词权值可有效提高 VSM和 NBC等文本分类模型的精度 ,并且训练文本规模越大 ,改进的效果越明显。 NBC的分类精度最高可达 86 %。
【Abstract】 Text classification is the key to text mining which is used extensively in traditional information searches, web information queries and web mining. A text classification method was developed using a weighted adjustment measure to improve the vector space model (VSM) and the naive Bayesian classifier (NBC). The EM algorithm was then used for non tutor Bayesian learning and a Chinese/English text classification system was developed. Three sets of test results show that the weighted adjustment measure using scoring functions can improve the precision of text classification models such as VSM and NBC with the effect increasing with increasing size of the training text set. The maximum NBC precision is 86%.
【Key words】 text classification; weight adjustment; VSM; Bayesian classifier;
- 【文献出处】 清华大学学报(自然科学版) ,Journal of Tsinghua University(Science and Technology) , 编辑部邮箱 ,2003年04期
- 【分类号】TP391.1
- 【被引频次】56
- 【下载频次】550