节点文献
信噪比优化算法文档分类及特征权重公式改进研究
Document Clustering Based an Improved Signal/Noise Ratio and Term Weight Equation
【摘要】 为了提高文档分类的效率,增强信息检索的有效性,提出了一种改良的基于信噪比的文本分类方法,改进了特征权重公式,优化了文本分类。在改良后的信噪比算法中,考虑了禁用词的剔除和近义词、同义词的归类合并,解决了传统信噪比对汉语应用存在的二个问题;在改进的权重公式里,引入了特征权重在同一文档不同部位的贡献率,据此特征词的检索准确率提高。
【Abstract】 In order to improve the efficiency of document clustering, we give a improved method based on signal/noise ratio. In the proce- dure, term weight equation is changed and we refined text classification. After we improve the method, we take the forbidden words, similar words into consideration. We solved two problems in its application in Chinese. Because the term play different roles in different position through the whole article, the term weight are used in it for the equation, finally the finding accuracy is improved.
【关键词】 文档分类;
信噪比算法;
权重公式;
特征词;
【Key words】 document clustering; signal/ratio method; term weight equation; term;
【Key words】 document clustering; signal/ratio method; term weight equation; term;
- 【文献出处】 微计算机信息 , 编辑部邮箱 ,2006年21期
- 【分类号】TP391.1
- 【被引频次】4
- 【下载频次】111