节点文献

科技文献关键词自动标注算法研究

Study of Automatic Keywords Labeling for Scientific Literature

推荐 CAJ下载
PDF下载
不支持迅雷等下载工具，请取消加速工具后下载。

【Author】 NI Na1 LIU Kai 1,2 LI Yao-dong1(State Key Laboratory of Intelligent Control and Management of Complex Systems,Institute of Automation,Chinese Academy of Sciences,Beijing 100190,China)1(Dept.of Distribution System,R&D Center,TravelSky Technology Limited,Beijing 100029,China)2

【机构】中国科学院自动化研究所复杂系统智能控制与管理国家重点实验室(筹)；中国民航信息网络股份有限公司研发中心分销产品研发部；

【摘要】未标注或遗失关键词给科技文献的分类和导航工作带来一定困难,针对这一问题,提出了基于文献摘要内容的关键词自动标注算法。该算法使用标注过关键词的文献摘要作为训练文本,分别采用语言模型、LatentDirichletAllocation(LDA)模型、ProbabilisticAuthor-Topic模型及语言模型+LDA模型的组合模型对训练集中的摘要文本和关键词建模,建立关键词和组成摘要文本特征词之间的关系,然后利用这些模型在未标注关键词的科技文献摘要上进行关键词的预测。在中英文数据上的实验结果表明,自动标注的关键词能较好地反映科技文献的内容;在所有模型中,语言模型+LDA组合模型的效果最佳。更多还原

【Abstract】 Keywords of scientific literatures provided by authors are helpful for readers.But there are also some scientific literatures that are not labeled with keywords due to all sorts of reasons.So this paper proposed a new abstract-based automatic keywords prediction algorithm for scientific literatures without keywords.The abstracts of scientific literatures,which had been given keywords by authors,were used as the training data set.Four text modeling methods:language model(LM),latent dirichlet allocation(LDA),probabilistic author-topic model,and a combination of LM and LDA were employed to model the abstracts and the keywords in training set to build the relations between keywords and terms of abstracts.Then the trained models were used to predict keywords for the abstracts of scientific literatures without keywords.The experimental results on both Chinese data sets and English data sets show that the keywords predicted by the proposed algorithms can reflect the content of scientific literature well.Among all of the models,the combination of LM and LDA is best.更多还原

【关键词】语言模型；标签预测； Latent Dirichlet Allocation； Probabilistic Author-Topic Model；
【Key words】 Language model； Tag prediction； Latent dirichlet allocation； Probabilistic author-topic model；

【基金】 973国家重点基础研究发展计划(2007CB311007);国家自然科学基金(61072084)资助

【文献出处】计算机科学 ,Computer Science , 编辑部邮箱 ,2012年09期

【分类号】TP391.1
【被引频次】2
【下载频次】152

知网节下载

节点文献中：

本文链接的文献网络图示:

本文的引文网络

节点文献