节点文献

基于树模型算法的动态网页信息抽取研究

Study of Dynamic Web Information Extraction Based on Tree Model Algorithm

推荐 CAJ下载
PDF下载
不支持迅雷等下载工具，请取消加速工具后下载。

【作者】邵辉；李芳；

【Author】 Shao Hui Li Fang Computer Science Department, Shanghai Jiaotong University, Shanghai 20030

【摘要】动态网页是Internet上重要的网页类型,它们通常是由网站的后台数据库通过某种通用的模板构成。如何从动态网页中抽取信息有着十分重要的意义,因为它们通常是一个网站最为主要的信息来源。本文提出了一种新的基于树模型算法的动态网页信息抽取方法。它通过树编辑距离模型和树归并算法(Tree Align)分离并抽取出动态网页中的信息项。实验表明,这种基于树模型的抽取方法能够准确的定位和抽取动态网页信息。更多还原

【Abstract】 Dynamic web pages are important part on the Internet. They are usually generated from a database through a common template. Extracting Information from these dynamic pages is very meaningful, because they are always the main information sources of the website. This paper presents a new method based on Tree Model algorithm to extract information on the dynamic web pages. It uses Tree Edit Distance Model and Tree Align algorithm to locate and extract the information records on the dynamic web pages. According to experiments, this new method really can improve the performance of Information Extraction on the dynamic web pages.更多还原

【关键词】 Web信息抽取；树编辑距离；包装器；
【Key words】 Web Information Extraction； Tree Edit Distance； Wrapper；

【会议录名称】第二届全国信息检索与内容安全学术会议（NCIRCS-2005）论文集

【会议名称】第二届全国信息检索与内容安全学术会议（NCIRCS-2005）

【会议时间】2005-10
【会议地点】中国北京
【分类号】TP391.1

【主办单位】中国中文信息学会信息检索与内容安全专业委员会

知网节下载

节点文献中：

本文链接的文献网络图示:

本文的引文网络

节点文献