节点文献

基于知识融合的在线文本分类算法——语义SVM

An On-line Text Categorization Algorithm Based on Information Fusion:Semantic SVM

  • 推荐 CAJ下载
  • PDF下载
  • 不支持迅雷等下载工具,请取消加速工具后下载。

【作者】 代六玲李雪梅黄河燕陈肇雄

【Author】 Dai Liu-ling Li Xue-mei Huang He-yan Chen Zhao-xiong (Dept. of Computer Science, Nanjing Univ. of Science and Tech. , Nanjing 210094, Jiangsu, China; Dept. of Electronic Information Engineering, Beijing Electronic Science and Tech. Institute, Beijing 100070, China; Research Institute of Computer Language Information Engineering, the Chinese Academy of Sciences, Beijing 100083, China)

【机构】 南京理工大学计算机科学系北京电子科技学院电子信息工程系中国科学院计算机语言信息工程研究中心中国科学院计算机语言信息工程研究中心 江苏 南京 210094北京 100070北京 100083北京 100083

【摘要】 为使支持向量机(SVM)更加适用于在线文本分类应用。利用SVM在小训练样本集条件下仍有高泛化能力的特性,结合文本特征向量在特征空间中具有聚类性的特点,提出一种用语义中心集代替原训练样本集作为训练样本和支持向量的SVM:语义SVM.文中给出了语义中心集的生成步骤、语义SVM的在线学习算法框架。以及基于SMO算法的在线学习算法的实现.实验结果表明,相对于标准SVM,语义SVM及其在线学习算法不仅在线学习速度和分类速度有数量级提高,而且在分类准确率方面具有一定优势.

【Abstract】 The aim of this paper is to make SVMs (Support Vector Machines) more applicable to on-line text categorization applications. As SVMs are of good generation ability even with small training sets and text feature vectors are clustery in the feature space, an algorithm for text categorization, namely, semantic Support Vector Machine ( Semantic SVM) , is proposed by substituting the original training text set with the semantic center set. This semantic center set is used as the training text and support vector candidates. The steps to generate the semantic center set and the framework of the on-line learning algorithm of semantic SVM are then presented, as well as the implementation of the on-line learning algorithm based on Sequential Minimal Optimization. Experimental results show that, compared with the standard SVMs, the proposed semantic SVM and its algorithm can improve the on-line learning speed and the classifying speed by orders with a high classifying veracity.

【基金】 国家自然科学基金资助项目(60272088)
  • 【文献出处】 华南理工大学学报(自然科学版) ,Journal of South China University of Technology(Natural Science) , 编辑部邮箱 ,2004年S1期
  • 【分类号】TP391.1
  • 【被引频次】3
  • 【下载频次】355
节点文献中: 

本文链接的文献网络图示:

本文的引文网络