节点文献
基于XML的个性化信息检索系统研究
The Research of Personalized Information Retrieval System Based on XML
【作者】 蔡国民;
【导师】 王雅琳;
【作者基本信息】 中南大学 , 计算机应用技术, 2007, 硕士
【摘要】 本文针对目前Internet上信息获取存在效率不高与“资源迷向”的问题,以及日益增长的个性化需求,提出了一个面向Web的基于XML的个性化信息检索系统模型,研究了其中的关键算法,所研究的内容目前属于信息检索和电子商务的重要研究课题和热点,具有一定的理论和实际应用意义。本文首先研究了国内外搜索引擎系统及主要算法,分析了搜索引擎系统的主要结构和存在的主要问题,并在此基础上探索了基于XML的个性化信息检索系统所涉及的关键技术和算法,主要围绕用户模型的生成和个性化搜索引擎的系统结构以及提高搜索引擎性能的关键技术三个方面进行。主要的工作是:首先通过天网日志文件对用户行为特征进行了统计分析,指出用户查询词及查询过程相对稳定,由此构造基于行为特征的用户模型,并给出相应生成算法;其次,在分析搜索引擎基本结构的基础上,提出了个性化系统实现的基本结构,并分析了其实现的关键技术;第三,在构造个性化搜索引擎原型系统的过程中,结合统计规律,确定了以提高查准率为主要目标的实现思路,改进了信息抓取策略;优化了网页去噪消重算法;提出了以单字构造中文分词词库的新方法;结合用户模型,改进相关分析方法,并拓展了相关分析的应用领域。理论分析和实验结果表明,构造的原型系统是可行和有效的。
【Abstract】 At present, people are confronting with the problems of inefficient inquiry in acquiring information and ’information bewilderment’ in Internet, and their personalized requirements are growing day by day. To deal with them, this paper proposes a personalized information retrieval model based on XML for Web and researches the key algorithm in this field. This research is an important issue in information retrieval and is of important theory significance and practical significance.At first, a great number of search engine system and main algorithms at home and abroad are researched in the paper. Then the main structure and existed problems of search engine system are analyzed, based on these researches, the primary algorithms and technologies of the personalized information retrieval system are researched. In order to improve the performance of search engine, the three aspects around building of user’s model and the system structure of the personalized search engine are studied as follows:(1)According to statistically analyzing the user’s behavior features from the log file of TianWang(e.pku.edu.cn), the search word and the search process are pointed out to be relatively stable, then the user’s model based on behavior features and the relevant algorithms are proposed.(2)Based on analyzing the basic structure of search engine, the basic structure of personalized system realization is proposed, and the key technologies of personalized system realization are analyzed.(3)In the process of constructing the personalized engine prototype, combining the statistical rules, the achieving method to increase the rate of search accuracy is determined, the information retreating strategies are improved, the page cleaning and reducing-repetition algorithms are optimized, and a new method with single word to construct Chinese words library is proposed. Meanwhile, combining the user’s model, the relevant correlation analysis methods are improved, and which application area are broadened.This prototype system is proved to be feasible and effective by theoretical analysis and experimental results.
【Key words】 personalized; user’s model; information retrieval; relevant analyzing;
- 【网络出版投稿人】 中南大学 【网络出版年期】2007年 06期
- 【分类号】TP391.3
- 【被引频次】10
- 【下载频次】263