节点文献
一种基于两级分类器的垃圾短信过滤方法
A Spam Message Filtering Approach Based on A Two-stage Classifier
【Author】 Zhanyi Wang,Weiran Xu,Dongxin Liu,Jun Guo PRIS Lab,Beijing University of Posts and Telecommunications,Beijing 10086
【机构】 北京邮电大学;
【摘要】 垃圾短信过滤是一个文本分类问题。如何设计分类器,可以在训练样本较少的情况下达到可观的精度是一个重要的问题。本文对传统分类器的结构加以改进,设计了一种基于潜在中间层的两级分类器,每级用贝叶斯方法实现。进一步地,将朴素贝叶斯分类器与之加权结合。实验结果表明,两级分类器大幅度提升了分类错误率的收敛速度。组合分类器在此基础上提高了训练样本较多时的精度,集成了两者的优点。
【Abstract】 Spam message filtering is an issue of text categorization.How to design a classifier that can reach a high precision when the training samples are not enough is a crucial problem.In this paper,the structure of traditional classifier is improved.A two-stage classifier based latent topics was implemented by Bayes method in every stage.Further more,it combined Na(i|¨)ve Bayes classifier through weighing.Experiments showed that the two-stage classifier significantly increased the converge rate of the error rate.Besides that,combined classifier raised the precision when the training samples were numerous and integrated the advantages of them.
【Key words】 Spam Message Filtering; Text Categorization; Na(i|¨)ve Bayes; Latent Topics; Combined Classifier;
- 【会议录名称】 第五届全国信息检索学术会议论文集
- 【会议名称】第五届全国信息检索学术会议
- 【会议时间】2009-11-14
- 【会议地点】中国上海
- 【分类号】TP391.1
- 【主办单位】中国中文信息学会信息检索与内容安全专业委员会