节点文献

余弦相似度在高校综合信息系统中的应用

Application of cosine similarity in university comprehensive information system

  • 推荐 CAJ下载
  • PDF下载
  • 不支持迅雷等下载工具,请取消加速工具后下载。

【作者】 朱浩连德富左志宏颜凯

【Author】 Zhu Hao;Lian Defu;Zuo Zhihong;Yan Kai;Information Center,University of Electronic Science and Technology of China;Information Technology Department,College of William & Mary;Big Data Research Center,University of Electronic Science and Technology of China;

【机构】 电子科技大学信息中心威廉玛丽学院信息技术部电子科技大学大数据研究中心

【摘要】 针对电子科技大学综合信息系统中学术论文填报数据不准确的问题,提出了通过余弦相似度计算来识别标准期刊名或会议名的方案.首先对填报名进行预处理,并清洗来自互联网的爬取名,进而得到测试名.通过经典的TF-IDF方法,对所有测试名和标准期刊名进行分词、去除停止词和取词操作,在计算出每个单词的TF-IDF值后,即可将所有的测试名和标准期刊名都转化为由所有单词的TF-IDF值构成的多维向量.通过计算测试名和标准期刊名间的余弦相似度,即可最终识别出正确的标准期刊名.实际的识别结果表明,余弦相似度计算极大地提高了学术论文填报数据的质量.

【Abstract】 Aiming at the data problem of the academic papers filled by the teachers in the comprehensive information system of University of Electronic Science and Technology of China,a solution to find the standard journal names or the conference names by calculating the cosine similarity is presented. First,the filled names are pretreated and the names crawled from the Internet are cleaned,and then the test names are generated. Through a classic TF-IDF method,all of the test names and the standard journal names are divided into words and the stop words of the names are removed.Then the words are taken from the names. After the TF-IDF value of every words is calculated,all of the test names and the standard journal names are converted into multidimensional vectors consisting of the TF-IDF value of every words. By calculating the cosine similarity between the test names and the standard journal names,the correct standard journal names are identified. The identification results show that the cosine similarity calculation can improve the quality of the filled data for the academic papers.

【基金】 电子科技大学专项建设资助项目(Y03093036001089)
  • 【文献出处】 东南大学学报(自然科学版) ,Journal of Southeast University(Natural Science Edition) , 编辑部邮箱 ,2017年S1期
  • 【分类号】TP399-C1
  • 【被引频次】8
  • 【下载频次】186
节点文献中: 

本文链接的文献网络图示:

本文的引文网络