节点文献

基于语义网的智能搜索技术的研究与实现

【作者】 凌海云

【导师】 左志宏;

【作者基本信息】 电子科技大学 , 计算机软件与理论, 2004, 硕士

【摘要】 本文介绍了一种基于语义Web的XML全文搜索引擎。它不仅可以检索文档的内容,还可以检索文件的结构。它采用简化的XPath语法查询,返回符合查询需求的结点或结点集合。不仅可以检索XML文档,还可以检索PDF、JPEG等文件中的XMP包。XMP包,其实是嵌入宿主文件的XML文档片段,是Adobe公司制定的一种元数据标准。本系统最初被设计用来在语义Web上检索包含DC、PRISM与XMP元数据的XML文档,但它具有内在的可扩展性。系统维持一个可索引的NS的列表,所有NS位于这个列表中的元素与属性都被系统索引。系统的管理者可以配置这个列表,控制应该对哪些NS中的元素与属性建立索引,当然也可以对所有的NS开放,包括NS为空的情况。本文首先介绍了语义Web的历史、体系结构和几个关键技术——XML、RDF(S)和Ontology,及DC、PRISM、XMP元数据标准,并着重介绍了Adobe公司的XMP包技术。XMP包,是嵌入宿主文件中的结构良好的XML文档片段,一般来讲,它是对宿主文件元数据信息的简单描述。接下来介绍了基于传统Web的搜索引擎技术的相关知识,包括搜索引擎的分类、性能指标、主要构成部件、及它的发展趋势。最后是对上文提到的搜索引擎的研究、设计与实现,并提出了一些改进与设想。

【Abstract】 This paper introduces a Semantic Web based full-text search engine for XML documents. Not only the engine can search contents of the documents, but also it can search the construction, such as elements, attributes and the relations between them. It uses a reduced XPath syntax to search, so a corresponsive node or node set will be returned. It also searches into some files like PDF, JPEG and etc to find an embedded XMP packet, which is a XML document fragment. XMP is a metadata specification published by Adobe Corp. Originally the engine was designed to search on the Semantic Web and to find the XML documents containing DC, PRISM and XMP metadata. It also has an inherent extensibility to find other files containing other metadata. The system keeps a list of NS, and all the elements and attributes whose NS is in the list can be indexed by system. The system manager will configure the NS list to control that which element or attribute can be indexed. He can also configure to index all the NS, including the NULL. In this paper we firstly introduce the history of Semantic Web, its architecture and some key technologies like XML, RDF(S) and ontology. Some metadata such as DC, PRISM and XMP also be introduced, extremely the XMP packet technology. The XMP packet is a XML document fragment embedded into other files. Generally it is a description about the metadata of the host file. Secondly we introduce the knowledge of classical search engine technology on the classical Web, including classifying, performance guideline, important components and its trend. Lastly we introduce our research, design and implementation of the search engine mentioned above, also with some visualizations and improvements.

【关键词】 语义Web搜索引擎ontologyRDFXMLXMP
【Key words】 Semantic WebSearch EngineontologyRDFXMLXMP
  • 【分类号】TP391.3
  • 【被引频次】15
  • 【下载频次】746
节点文献中: 

本文链接的文献网络图示:

本文的引文网络