节点文献
基于条件随机域的中文长地名结构标注
Structure Labeling of Chinese Long Address with Conditional Random Field
【Author】 Hong Sun,Wenjun Wang,Ruifang He,Bolei Hu,Yueheng Sun Information System and Software Engineering Lab School of Computer Science and Technology,Tianjin University,Tianjin 300072
【机构】 信息系统与软件工程实验室 计算机科学与技术学院 天津大学;
【摘要】 中文长地名结构标注是将自然文本形式的长地名按语义不同分割成不同部分并添加标签。在信息检索、自动问答及信息抽取等领域有着重要的应用。中文地名存在格式和结构不规则的问题,在实际应用中,给数据存储的粒度以及查询的准确率带来严重影响。已有研究使用最小风险化的方法对地名进行标准化,由于主要面向英文地名缩写和误拼的问题,没有充分考虑中文地名的特点。本文以应急领域的长地名数据为研究对象,通过对长地名的结构进行标注进而规范其格式。在标注前首先采用启发式方法对长地名的分词结果进行改进,然后利用条件随机域模型对长地名的结构进行标注,从而给长地名的不同部分添加上表明各自语义的标签。实验表明,经过分词改进和基于条件随机域模型的长地名结构标注的性能有显著提高。
【Abstract】 Structure labeling of Chinese long address segments the Chinese address string into different elements and adds semantic labels.It’s an important task in the field of information retrieval,question answering and information extraction.Many Chinese addresses contain irregular formats and structures.In practice,such problem often has a bad effect on data storage and querying.Existing method used Robust Risk Minimization to standardize address.However,it was less considerate in Chinese address as it mainly focused on the problem of abbreviation and misspelling in English address.The address data of emergency field was used as experimental object and we standardized Chinese long address through structure labeling.Firstly,we improved the word segmentation results of existing tools based on heuristic rules,and then using Conditional Random Field to label the structure of Chinese long address.Experimental results showed that the proposed method significantly improved the performance of structure labeling of Chinese long address.
【Key words】 Conditional Random Field; Chinese Word Segmentation; Structure labeling of Chinese Long Address;
- 【会议录名称】 第六届全国信息检索学术会议论文集
- 【会议名称】第六届全国信息检索学术会议
- 【会议时间】2010-08-12
- 【会议地点】中国黑龙江牡丹江
- 【分类号】TP391.1
- 【主办单位】中国中文信息学会信息检索与内容安全专业委员会