节点文献
决策树在中文姓名信息提取中的应用研究
Application of decision tree to Chinese name information extraction
【摘要】 提出并实现了一种中文姓名信息提取方法,该方法首先根据姓氏和名的用字概率信息,将姓氏作为抽取的触发条件,在文本中初步提取姓名。然后再充分利用中文姓名的上下文信息及姓名用字之间的关联程度的信息,选取特征作为决策树测试的属性列表,并将初步提取出来的姓名是否是真实姓名(bool型的值:yes或no)作为决策树要预测的目标属性,组建基于ID3算法的决策树进一步提取出正确的姓名,实验结果表明,该方法具有很好的召回率和准确率。
【Abstract】 A way to extract the Chinese person names is presented and realized.It extracts the Chinese name preliminarily according to the statistical information and the Chinese surnames.A decision tree based on the ID3 algorithm is built to distinguish whether it is a real name.In the decision tree the attributes are chosen by use of the context information of the name and the relationship of the Chinese names.The target-attribute of the decision tree is yes or no(Bool type).The result of the experiment shows that the recall rate and accuracy rate are guaranteed.
【Key words】 natural language processing; Chinese names extraction; decision tree; ID3 algorithm;
- 【文献出处】 成都信息工程学院学报 ,Journal of Chengdu University of Information Technology , 编辑部邮箱 ,2006年02期
- 【分类号】TP391.1
- 【被引频次】3
- 【下载频次】183