节点文献
增量学习的TFIDF_NB协同训练分类算法
TFIDF_NB Cooperative Training Algorithm Based On Incremental Learning
【Author】 LIU Xin ZHANG Yong (College of Information Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 210016,China)
【机构】 南京航空航天大学信息科学与技术学院; 苏州大学计算机科学与技术学院;
【摘要】 TFIDF和NB(naive bayesian)都属于有监督的学习算法,采用人工分好类的文档集来训练分类器。训练集的规模会在很大程度上影响分类器的性能,然而获得大规模的已标记文档是不易的。本文在分析了EM算法的基础上,提出了一种新的协同训练算法。该算法利用Naive Bayes和TFIDF两种分类器结合少量已标记和大量未标记文档协同增量训练。实验结果表明,协同训练算法分类精度高,平均错误率较EM低,具有较好的性能。
【Abstract】 Both TFIDF and NB(naive bayesian) are supervised learning algorithm.The size of the training sets will influence the performance of the two classifiers.however,it is difficult to get large-scale labeled documents.After analysis of EM algorithm,we presented a new cooperative training algorithm based on incremental learning.The algorithm incorporates a small number of labeled documents with a large number of unlabeled documents to cooperatively and incrementally to train the TFIDF and naive bayesian classifier.Experimental results show that the cooperative training algorithm achieves higher classification accuracy,better performance and lower average error rate than the EM algorithm.
【Key words】 text classification; incremental learning; cooperative training; TFIDF; NB;
- 【会议录名称】 中国电子学会第十六届信息论学术年会论文集
- 【会议名称】中国电子学会第十六届信息论学术年会
- 【会议时间】2009-09-18
- 【会议地点】中国北京
- 【分类号】TP181
- 【主办单位】中国电子学会信息论分会