节点文献

利用规则权重改进文本关联分类

Association Rules Text Categorization Based on Weighted Rules

  • 推荐 CAJ下载
  • PDF下载
  • 不支持迅雷等下载工具,请取消加速工具后下载。

【作者】 陈晓云胡运发

【Author】 CHEN Xiao-Yun~(1,2) and HU Yun-Fa~1 1(Department of Computer and Information Technology,Fudan University,Shanghai 200433) 2(School of Mathematics and Computer Science,Fuzhou University,Fuzhou 350002)

【机构】 复旦大学计算机与信息技术系福州大学数学与计算机科学学院

【摘要】 近年来,基于关联规则的文本分类方法受到普遍关注,其中ARC-BC是准确性和性能最好的一种,在一般情况下可获得较好的分类效果.但当样本特征词分布不均时,其分类准确率明显降低.基于规则权重调整的关联规则文本分类算法(WARC)可有效地解决这一问题.该算法利用分类关联规则对训练样本进行分类测试,根据误分类训练样本的数量定义规则强度,对强规则通过乘以小于1的调整因子降低其权重,而弱规则乘以大于1的调整因子提高其权重.研究结果表明经过规则权重的调整,其分类精度显著提高.

【Abstract】 Recently,categorization methods based on association rules have been given much attention.In general,association classification has the higher accuracy and the better performance.However,the classification accuracy drops rapidly when the distribution of feature words in training set is uneven. Therefore,a text categorization algorithm,called weighted association rules categorization(WARC),is proposed in this paper.In this method,association rules are used to classify training samples and rule intensity is defined according to the number of misclassified training samples.Each strong rule is multiplied by factor less than 1 to reduce its weight while each weak rule is multiplied by factor more than 1 to increase its weight.The result of research shows that this method can remarkably improve the accuracy of association classification algorithms by regulation of rules weights.

【基金】 国家自然科学基金项目(69933010);福建省教育厅科研基金项目(JB02069)
  • 【会议录名称】 第二十一届中国数据库学术会议论文集(研究报告篇)
  • 【会议名称】第二十一届中国数据库学术会议
  • 【会议时间】2004-10-14
  • 【会议地点】中国福建厦门
  • 【分类号】TP311.13
  • 【主办单位】中国计算机学会数据库专业委员会
节点文献中: 

本文链接的文献网络图示:

本文的引文网络