节点文献
基于本体的语义信息系统研究
Research on Ontology-based Semantic Information System
【作者】 余传明;
【导师】 董慧;
【作者基本信息】 武汉大学 , 情报学, 2005, 博士
【副题名】理论分析与系统实现
【摘要】 语义信息系统是情报学研究的一个全新的领域,对这个新兴的领域进行探索具有非常重要的意义。首先,能够从一定程度上满足人们知识需求的需要,知识经济的发展使得人们对于信息的需求也发生了根本的变化,人们对于信息的需求转化为知识需求,这就使得信息的加工朝着系统化组织资源,提炼知识的方向发展;符合了信息系统向语义互连网融合的趋势,目前Internet在信息表达和检索方面存在缺陷,主要在于其设计目的是面向用户的直接阅读和处理,而没有提供计算机可读的语义信息,因此限制了计算机在信息检索中的自动分析处理以及进一步的智能化处理的能力,而语义互联网,将力求使计算机之间能够从语义层次上互相理解和沟通,这对于传统的互联网将产生巨大推动和革命作用;能够弥补传统信息检索的不足,传统的关键词的检索方式曾经在一定程度上满足过用户的需求,由于字义本身与其概念的延伸不在同一级上,这使得寻找的结果可能仅仅是与字面意义或某层意义相匹配,但人们想要的往往是这个信息的概念及其相关成分,而不仅仅是字面所表达的意思,基于语义进行信息检索,正好能够满足此项需求;符合信息系统自身从传统的面向句法和结构转为面向语义的趋势,在传统的信息系统中,异构信息的处理和分布式信息的处理已经成为热点,解决这些问题的核心是提高信息系统的互操作能力,而“信息系统互操作的核心将由系统、句法、结构转为语义”,从这个意义上说,开发语义信息系统本身具有十分关键的作用。 本文正是从这一实际出发,探讨了一种新的信息系统——语义信息系统的概念和原理,以本体作为语义信息系统的基础,从语义信息描述、语义信息获取、语义信息检索、语义信息输出四个方面对语义信息系统的技术细节进行了深入分析,并结合实际,讨论了一个以历史事件为研究领域的语义信息系统(国共两党合作语义信息系统——GGHZ-SIS)的设计、构建和实现。全文共计十万余字,分八章,主要内容如下: 1.语义信息系统概述 语义信息系统还是一个全新的概念,这一章十分详细的分析了语义信息系
【Abstract】 It is very significant to study Semantic Information System (SIS) as a researcher in the field of Information Science. First, it can satisfy our knowledge requirement. Our desire for knowledge is becoming more and more extensive with the development of knowledge economy. Second, information system has the tendency to be fused with the semantic web. As we know, the present Internet has the flaw in the information expression and the retrieval aspect, as it is designed for human-beings to read, not for machines to read. Therefore making the Internet machine-readable would have the huge impetus and the revolutionary function regarding the traditional Internet. Third, It can make up the insufficiency of traditional information retrieval. The tradional information retrieval has met the user’s needs in a certain degree based on the key word retrieval way but the result was merely matched with word significance, not concept significance, cause word and concept is not in the same level. But the semantic information retrieval can satisfy our demand for concept retrieval. Fourth, it conforms to the trend that information system will change from syntax and syntactic-oriented IS to the semantic-oriented IS. In the traditional information system researching area, the isomorphic information processing and distributed information processing are becoming the hot spot. The key element to solve these kinds of problems is to improve the interoperability of information system. The semantic interoperability is the core problem of information system interoperability, as "the core of the interoperability of information system will shift from systematic, syntactic and structural to semantic". From these points it is very important to develop Semantic Information System.Based on the current needs listed above, the dissertation defined and developed a new kind of information system----Semantic Information System. Based the ontology technology, the dissertation analyzed the composition of information from four aspects: semantic information description, semantic information acquisition, semantic information retrieval and semantic information output. The author also discussed the process, problem and experience in build a pragmatic Semantic Information System-----the Guomingdang Gongchandang He Zuo Semantic Information System(GGHZ-SIS).1. Introduction to Semantic Information SystemAs Semantic Information System is a totally new concept, this chapter analyzed the definition, constitution, characteristics of Semantic Information System, and compared it with the traditional Management Information System (MIS), Competition Intelligence System (CIS), Decision Support System (DSS), Expert System (ES) and so on. It also presented a prototype of SIS which has five components: semantic information description component, semantic information acquisition component, semantic storage component, semantic information retrieval component and semantic information output component. It should be emphasized that the superiority of SIS will be fully unfold only when it is fused with the semantic web and only when the semantic web is becoming practical, thus we still have a long way to go before we can fully harvest the potential of SIS.2. The Foundation of Semantic Information System—OntologyOntology is the foundation of semantic information description. As we know, the semantic information is mainly composed of the semantics class, the semantic property, the semantic relations, the semantic rule and the semantic instance, which can be mapped to the concept, the concept attribute, the concept relations, the rule and the axiom in the ontology. Ontology is also the reference in the semantic information extraction, as it could help us weigh the important degree of the semantic information. Ontology is also the assisted method in the semantic retrieval process. As ontology itself has certain degree of inference ability, we may use ontology to expand our query and thus causes the result to be more comprehensive; Ontology is also the main form of semantic output. Based on the above four reasons, we may consider that ontology is the foundation of semantic information system, therefore in this chapter the author analyzed the definition, classification, building methodology and especially acquisition method of ontology3. Semantic Information DescriptionIt is not from zero point that we begin to describe semantic information. During the last ten years for the great development of Internet, people have gained a lot of experience in how to describe metadata and Internet data. RDF (Resource Description Framework) is among them. The RDF metadata model is based upon the idea ofmaking statements about resources in the form of a subject-predicate-object expression, called a triple in RDF terminology. The subject is the resource, the "thing" being described. The predicate is what trait or aspect about that resource that is being described, and often expresses a relationship between the subject and the object. The object is the object of the relationship or value of that trait. The Resource Description Framework Schema (RDFS) is an extension to RDF that describes how to define RDF vocabularies using RDF itself. It defines, among other things, two important properties, rdfs: subClassOf and rdfs: subPropertyOf. And then comes the OWL— OWL is an acronym for Web Ontology Language, a markup language for publishing and sharing data using ontology on the Internet. OWL is a vocabulary extension of RDF and is derived from the DAML+OIL Web Ontology Language. After analysis on the current descriptional languages, the author suggested that OWL is the best recommendation.4. Semantic Information AcquisitionThe main task of semantic information acquisition is to extract the semantic instance and the semantic relationship from the unstructured information (Text, Picture, Audio, and Video), semi-structured information and structured information. For the structured and semi-structured information, it is easy to build the map from the formal structural to the semantic class and semantic relation and thus do the transformational work. Thus the most difficult part of semantic information extraction is on how to deal with Natural Language Processing, especially for the Chinese language. In this chapter, the author described several difficulties in Chinese semantic information acquisition and provided a Shallow Parsing-based semantic information acquisition method.5. Semantic Information RetrievalIn this chapter, the author defined the semantic information retrieval as "on contrary of traditional information retrieval, it is a new kind of information method, in which the information input, information organization and searching result all have semantic meaning". Based on the definition, the author gave the details on how to entrust with semantic meaning in the input, organization and output process..6. Semantic Information VisualizationThe main task of semantic visualized output is to show the semantic object andits relation ships to the user. Wehrend has summarized the ways to do the visualization, which include orientation, identify, distinguish, categorize, cluster, distribute, order, compare, associate and relate. But for the SIS, the author reduced these into three key techniques: Zoom/Pan, Focus/Context and Incremental Navigation. Based on this need, the author analyzed several visualization components: TGVizTab, Jambalaya, Onto Viz and OntoRama.7. Design and Realization of GGHZ-SISIn this chapter, the author gave the detail in designing and realizing GGHZ-SIS. The author described the process and result of using OWL to describe the event, persons, location, organization and so on in the filed of GGHZ. The author realized the GGHZ semantic information acquisition component with several steps: splitting the paragraph into sentence, word tokenization and Part-of Speech tagging, selecting the semantic predict, selecting the semantic subject based on the semantic predict, selecting the semantic object based on the semantic predict, pronoun resolution, time correction and location correction and at last updating the semantic extraction context. The author provided the technical details in realizing the semantic retrieval component in GGHZ-SIS. At last, the author provided the details in developing the visualization part of GGHZ-SIS based on TouchGraph.8. SummarizationIn this chapter, the author gave the Summarization.(Diagram 30 Table 19)
【Key words】 Semantic Web; Semantic Information System; Information Extraction; Information Retrieval; Information Visualization;
- 【网络出版投稿人】 武汉大学 【网络出版年期】2006年 05期
- 【分类号】G354
- 【被引频次】77
- 【下载频次】3127