节点文献

基于关键词元的话题内事件检测

Word Committee based Event Identification

  • 推荐 CAJ下载
  • PDF下载
  • 不支持迅雷等下载工具,请取消加速工具后下载。

【作者】 张阔李涓子吴刚

【Author】 Zhang Kuo Li JuanZi Wu Gang (Department of Computer Science and Technology,Tsinghua University,Beijing 100084)

【机构】 清华大学计算机系

【摘要】 各种媒体每天有大量的新闻报道产生,需要一种自动化的分析方法将新闻以一种更加清晰的组织形式展示给用户。大多已有工作将新闻划分成平面的话题,然而一个话题并非仅仅是简单的新闻集合,而是由一系列的事件所组成的。由于话题内的事件之间往往非常相似, 导致话题内的事件检测精确度较差。为了克服以上问题,本文提出了词元委员会的方法,首先挖掘每个事件的核心词元,随后利用事件的核心词元进行事件检测。在 LDC 的两个数据集上的实验结果显示,本文提出的事件检测方法可以显著的改善已有方法的效果。

【Abstract】 With the overwhelming volume of news stories created and stored electronically everyday,there is an increasing need for techniques to analyze and present news stories to the users in a more meaningful manner.Most previous research focus on organizing news set into flat collections(topics)of stories.However, a topic in news is more than a mere collection of stories:it is actually characterized by a definite structure of inter-related events.Unfortunately,it is very difficult to identify events within a topic because stories about the same topic are usually very similar to each other irrespective of the events they belong to.To deal with this problem,we propose a method based on event key terms to identify events.We first capture some tight term clusters as term committees of potential events,and then use them to find the core story sets of potential events. At last we assign all stories to an event.The experimental results on two Linguistic Data Consortium(LDC) datasets show that the proposed method for event identification outperforms previous methods significantly.

【关键词】 事件检测关键词元
【Key words】 event identificationterm committee
【基金】 Supported by the National Natural Science Foundation of China under Grant No. 90604025(国家自然科学基金)
  • 【会议录名称】 第三届全国信息检索与内容安全学术会议论文集
  • 【会议名称】第三届全国信息检索与内容安全学术会议
  • 【会议时间】2007-11
  • 【会议地点】中国江苏苏州
  • 【分类号】TP391.3
  • 【主办单位】中国中文信息学会信息检索与内容安全专业委员会
节点文献中: