节点文献
汉语自动分词中中文地名识别
Chinese place names recognition for Chinese automatic segmentation
【摘要】 以词语级的中文地名为识别对象,根据地名内部用字的统计信息和地名构成特点产生潜在地名.在汉语自动分词中将可信度较高的潜在地名等同于句子的候选切分词,利用候选切分词本身的可信度和上下文接续关系评价句子的各种切分方案.在确定句子最佳切分时识别句子中的中文地名.对真实语料进行封闭和开放测试,封闭测试结果为召回率93.55%,精确率94.14%,F-1值93.85%;开放测试结果为召回率91.27%,精确率73.48%,F-1值81.42%.取得了比较令人满意的结果.
【Abstract】 Aiming at identifying word-level Chinese place names,Chinese place name candidates are generated with statistics information and internal characteristics of place names.In segmentation,Chinese place name candidates with higher confidence are treated as common word candidates obtained from the dictionary.Various segmentations of a sentence are evaluated with word candidates′ confidence and context information.Chinese place names are recognized when determining the optimal segmentation of the sentence.Close and open tests were conducted on real corpus: the close test results are the recall-back rate 93.55%,accurate rate 94.14%,F-1 value 93.85%;the open test results are 91.27%,73.48%,81.42% respectively.Experimental results are satisfactory.
【Key words】 Chinese place names recognition; Chinese automatic segmentation; unknown words recognition;
- 【文献出处】 大连理工大学学报 ,Journal of Dalian University of Technology , 编辑部邮箱 ,2006年04期
- 【分类号】TP391.43
- 【被引频次】30
- 【下载频次】843