节点文献
基于模式分类的汉语时态确定方法研究
A pattern-classification based solution for the recognition of tense of the Chinese language
【Author】 Lin Dazhen, Li Shaozi Computer Science Department, Xiamen University, Xiamen, Fujian 361005, China
【机构】 厦门大学计算机科学系;
【摘要】 汉语时态是中文信息处理领域的一个难点。基于规则的处理方法在无时态特征词的句子,多时态特征词的句子处理等方面存在很大问题。本文从统计的角度,提出一种基于模式分类的时态确定方法,该方法综合评价句子中每个词对时态确定所作的贡献,能够处理无时态特征词的句子和多时态特征词的句子,并且该方法使用线性判别函数,具有对多维数据分析,训练与判别速度快的特性。在开放测试环境下,对单句的汉语时态确定正确率与召回率分别为79.8%和95.3%。
【Abstract】 As far as NLP is concerned, the tense of the Chinese language is especially hard to tackle. One of the outstanding characteristics of the Chinese language is that its tense is usually implied rather than obvious. Hence, the Rule-based solution is far from suitable for the recognition of tense in situations where tense-informing words are missing or more than one of such words are present. In this paper, we introduce a pattern-classification based solution, which evaluates each single word in terms of its contribution to the recognition of tense for the concerned sentence. This solution proves effective when processing sentences containing none or more than one tense-informing words. Furthermore, the implementation of linear discriminating function in this solution leads to its abilities of multi-dimensional data processing and training, and helps to achieve decent performance. Evaluated under open conditions, the Precision and the Recall of this solution for single sentences are 79% and 95.3%, respectively.
【Key words】 Chinese; Tense; Characteristic words; linear discriminant function; perceptron criterion function;
- 【会议录名称】 第六届汉语词汇语义学研讨会论文集
- 【会议名称】第六届汉语词汇语义学研讨会
- 【会议时间】2005-04
- 【会议地点】中国厦门
- 【分类号】H085
- 【主办单位】厦门大学