节点文献

基于指纹和语义特征的文档复制检测方法

Document copy detection method based on fingerprint and semantic feature

  • 推荐 CAJ下载
  • PDF下载
  • 不支持迅雷等下载工具,请取消加速工具后下载。

【作者】 李旭赵亚伟刘国华

【Author】 LI Xu 1 , ZHAO Ya-wei 2 , LIU Guo-hua 1 (1. College of Information Science and Engineering, Yanshan University, Qinhuangdao, Hebei 066004, China; 2. Shijiazhuang Information Engineering Vocational College, Shijiazhuang, Hebei 050032, China)

【机构】 燕山大学信息科学与工程学院石家庄信息工程职业学院

【摘要】 文档复制检测是保护知识产权、提高信息检索效率的有效手段。提出一种基于指纹和语义特征的文档复制检测方法。介绍了指纹提取算法以及相关的重叠度度量,并且以知网的概念描述为基础对文本进行语义分析,利用词类信息和语义规则进行歧义消解,并采用基于框架的层级表示方法描述句子的语义特征。在3种测试集上把该方法与现存的方法在检测准确率上进行比较,实验结果表明该方法能够有效地检测出各种方式的复制文本。

【Abstract】 Copy detection for digital documents is a powerful tool to protect the author’s intellectual property and to improve the efficiency of information retrieval. A document copy detection method based on fingerprint and semantic feature is proposed. The fingerprint extraction algorithm and corresponding overlap measure are introduced. Syntactic parsing and semantic analysis are combined on the basis of the description of the concepts in the HowNet, and the part of speech and semantic rule are used to eliminate ambiguities. A frame-based hierarchy approach is used to represent the semantic features of a sentence. The proposed method is compared with the existing ones from three aspects. The experiments validate the efficiency of the proposed method.

【基金】 国家自然科学基金资助项目(60773100)
  • 【文献出处】 燕山大学学报 ,Journal of Yanshan University , 编辑部邮箱 ,2008年04期
  • 【分类号】TP391.1
  • 【被引频次】11
  • 【下载频次】272
节点文献中: 

本文链接的文献网络图示:

本文的引文网络