节点文献

基于扩展标记图的网页浏览与检索研究

Study on Web Page Browsing and Query Based on Extended Tag Graph

【作者】 王亮

【导师】 朱征宇;

【作者基本信息】 重庆大学 , 计算机软件与理论, 2004, 硕士

【摘要】 人们现在可以从万维网中获得各种各样的信息,但是这些信息都是高度分散、结构各异的。Web数据的这一特点使得网站的管理者花费大量的财力和精力用于网站数据的维护。而另一方面,搜索引擎作为Web信息检索系统的代表,虽然可以较好的为用户提供全网检索服务,但是其却很难进行实时的信息检索,也不能深入网页内部,对特定区域的信息进行查询或重组。扩展标记图(ETG)模型[1]是一种用于描述HTML数据的数据模型。该模型不但能有效的描述HTML标记结构,还提出了一种新的HTML数据存储方式,该方式支持HTML标记结构和数据的分离,有效的解决了HMTL信息的存储优化和管理问题。本文对基于ETG模型的Web浏览和检索服务进行了深入研究,提出并实现了一种切实可行的基于ETG模型的虚拟网页服务模式。这一全新服务模式包含了虚拟网页的设计、浏览、自动生成技术,实现了HTML模式和数据的分离,有利于实现结构清晰的Web数据组织和管理。本文给出了具体的虚拟网页及其模块化语法,ETG自动生成方法,以及服务器框架。通过在实验系统中的测试表明,本文提出的虚拟网页服务系统在实现数据优化存储与组织的基础上,同时也能够实现对用户而言透明的网页浏览。针对网页内部信息查询与重组的需求,本文对基于结构的Web检索技术进行探讨。本文提出了一种基于标记结构的类SQL的Web查询语言TagSQL。该查询语言充分考虑了扩展标记图的特点,除了能方便的描述和定位标记节点,还能对标记集合内的关系进行表述。在以上研究的基础上,我们对具体的基于扩展标记图模型的支持TagSQL查询语言的Web检索服务技术进行了研究,并给出了原形系统PowerSearcher。结合PowerSearcher,本文讨论了TagSQL语言的标记抽取与重组、集合操作、实时查询实现方法。该检索服务技术在实验系统中的应用表明其已具备大多数情况下深入网页内部的信息抽取与重组能力。以上概念和技术为网页浏览服务、信息检索提供了一种新的思路和实现方法,在电子商务中的Web信息服务领域具有一定的学术和应用价值。

【Abstract】 Nowadays people could obtain multifarious information From the World Wide Web, but these sources are autonomous and usually contain heterogeneous data. This characteristic of the web data lead the web site’s administrator cost large numbers of funds and energies to maintenance the web site. On the other hand, search engine as a representation of web information retrieval system, could service well in retrieval Within the whole web, but failed either to real-time retrieval, or retrieval inside the web page.Extend Tag Graph(for short: ETG) is an model for describing HTML Data .This model could represent the HTML tag structure efficiently .It also bring a new HTML data store scheme which supporting isolate the data From the tag pattern and solve the problem in web data storage optimize and maintenance fields.The Paper lucubrate the Web browsing and retrieval techniques based on ETG, and bring a new service framework for Virtual Web Page based on ETG model. This framework includes the method. for virtual page design, browsing and auto-generating. It isolates the data From the tag pattern, and be available to realize the well structured web data organize and management. The syntax of virtual page, the key algorithm of ETG auto- generating, the system framework discussed in this paper. The experimental systems show the ability to optimize the web data, and hide the detail to end-users.For the requirement of a detailed retrieval and reform inside the web page, a struct-based web query language has been discussed in the paper. We presented a new SQL-Like web query language called Tag Structure Query Language (for short: TagSQL) based on tag structure. This query language takes the advantage of Extend Tag Graph, have strong capabilities to describe the tag-node and the relationship inner the tag aggregate. With the research mentioned above, the web query techniques based on ETG and TagSQL has also been discussed in this paper. We present a experimental system PowerSearcher and the key techniques for supporting TagSQL. The paper presented the implement methods and key algorithms about Tag retrieval, reform, aggregate and real-time query. The proposed techniques implemented in PowerSearcher shows us the capabilities to retrieval and reform inside the web page in a mass of cases.The concept and techniques mentioned above bring forward a new implement <WP=6>framework for web page browsing and information retrieval, and should have the academic and applied value for the web information service of E-Commerce.

  • 【网络出版投稿人】 重庆大学
  • 【网络出版年期】2005年 01期
  • 【分类号】TP393.092
  • 【下载频次】85
节点文献中: 

本文链接的文献网络图示:

本文的引文网络