节点文献

基于自扩展的信息抽取模式自动获取

Automatic Acquisition of Information Extraction Patterns Based on Bootstrapping

  • 推荐 CAJ下载
  • PDF下载
  • 不支持迅雷等下载工具,请取消加速工具后下载。

【作者】 于江德王立新樊孝忠

【Author】 YU Jiang-de1,WANG Li-xin1,FAN Xiao-zhong21(School of Computer and Information Engineering,Anyang Normal University,Anyang 455002,China)2(School of Computer Science and Technology,Beijing Institute of Technology,Beijing 100081,China)

【机构】 安阳师范学院计算机与信息工程学院北京理工大学计算机科学技术学院

【摘要】 提出一种从未标注的中文文本中基于自扩展策略自动获取事件抽取模式的算法,该算法从少数几个种子抽取模式开始,通过一个增量迭代的过程发现新的抽取模式,在每一轮迭代中采用类似于TF/IDF的评估方法对产生的候选模式进行排序,选择最优的模式并入当前模式集.应用该方法从人民日报语料中自动获取"职务变动"类事件的抽取模式,实验结果表明,该方法产生的抽取模式在中文文本事件抽取中具有较好的抽取性能,综合指标F值达到66.3%.

【Abstract】 An algorithm based on bootstrapping strategy is presented to acquire extraction patterns automatically from un-annotated Chinese texts.Starting with a small set of extraction patterns as seeds,the algorithm applies an incremental iterative procedure to find new extraction patterns.During the process of the each iteration,the system evaluates the quality of the candidate patterns based on TF/IDF scoring,selects the top-ranked patterns and adds them to the set of current patterns. Experiments are performed on automatic acquisition of extraction patterns of management succession from the People Daily corpus.Experimental results show that extraction patterns generated with the algorithm have good result for Chinese event information extraction,and the F measure achieves 66.3%.

【基金】 国家自然科学基金项目(60663004)资助;教育部博士点基金项目(20050007023)资助
  • 【文献出处】 小型微型计算机系统 ,Journal of Chinese Computer Systems , 编辑部邮箱 ,2009年05期
  • 【分类号】TP391.1
  • 【被引频次】9
  • 【下载频次】229
节点文献中: