节点文献
Web使用挖掘技术的研究与实现
【作者】 李彬;
【作者基本信息】 电子科技大学 , 软件工程, 2007, 硕士
【摘要】 随着Internet的飞速发展,网上的数据资源空前的丰富。每天都会有成千上万的用户在网络上浏览和寻找自己所需的信息。然而,由于庞大的信息量,对于每个用户来说,如何能够及时快速的发现有用信息则变得异常的困难。为了解决上述问题,Web挖掘技术应运而生。其中,面向Web服务器日志的Web使用挖掘技术尤其得到了广大研究人员的关注。Web日志数据记录了用户对Web站点的访问信息,对这些信息进行分析可以发现用户访问站点的浏览模式和访问习惯,对于页面重组,优化网站的结构,以及在电子商务智能的应用等方面都具有十分重要的意义。本文对Web挖掘与Web使用挖掘进行了系统的分析和研究,并在此基础上,提出了一些新颖有效的技术和方法。论文所做的工作主要是:1.对数据挖掘和Web使用挖掘的相关知识和技术进行了概述,阐述了Web使用挖掘的意义、研究现状以及面临的问题;2.讨论了Web使用挖掘的三个阶段:数据预处理、模式发现和模式分析,分析了Web使用挖掘的应用领域和研究方向;3.第四章对Web使用挖掘的预处理过程进行了研究并分析了现有的一些数据预处理技术,然后一个框架式页面过滤算法被应用到Web日志数据预处理过程中,用于消除子框架页面的影响,提高数据挖掘系统的效率和准确度;4.在第五章,首先对Web挖掘中已有的聚类技术进行了简单介绍。然后,详细分析了一个典型的基于距离矩阵的聚类算法,发现该算法在可操作性和聚类准确度上存在着一定缺陷。为此,本文提出了一种新的快速聚类算法——基于相对Hamming距离的聚类算法,并通过设计一个简单的Web使用挖掘实验系统,对算法的有效性进行了验证。实验表明,通过该算法能得到更为准确的相似性客户群体和相关页面。
【Abstract】 With the rapid development of internet, digital resource in internet become more and more abundant. Thousands uponthousands consumers browse and search useful information for themselves in internet everyday. But, it’s very difficult to find useful information in time for each consumer because of the giant communication in internet. To solve this problem, Web mining techniques occur in season. Especially, lots of researchers pay more attention to the Web usage mining which face to Web server logs. Web logs record the visit information of Web site visitor. Therefore, we can obtain the browsing behavior and visiting habit of the customers by analyzing the Web logs, it has very important meaning to recombine pages, optimize the structure of Web site, improve capability of Web system and enhance the application of Electronic Commerce. This dissertation analyse and research the Web mining and Web usage mining by the numbers. Based on this, some novel techniques and methods are given in this paper. The main contents of this dissertation are as follows:1. Summarize the correlative knowledge and technique of data mining and Web usage mining, expatiate the meaning, actuality of research and facing problems of Web usage mining.2. Discuss the three phase of Web usage mining: Data Preprocessing, Pattern Discovery and Pattern Analysis. Moreover, the application fields and research directions of Web usage mining was analysed.3. In chapter 4, we investigate the preprocessing process of Web usage mining and analyse some typical Web log preprocessing techniques. Then a frame-filtering algorithm is applied to the process of Web log preprocessing to eliminate the influence of subframe and improve the efficiency and veracity of Web mining system.4. In chapter 5, this thesis first introduces the existent clustering techniques used in Web mining. Afterwards, a classical clustering algorithm which based on distance matrix was analysed in detail and pointed out that the algorithm have some limitation, such as lack of veracity and maneuverability. In order to solve this problem, this thesis put forward a new and fast clustering algorithm based on relative Hamming distance and design a simple web usage mining experimental system to actualize the algorithm. The results of the experiment indicate that we can get more exact similar customer groups and relevant Web pages by applying this algorithm.
【Key words】 Web usage mining; Preprocessing; Clustering algorithm; Relative Hamming distance;
- 【网络出版投稿人】 电子科技大学 【网络出版年期】2007年 03期
- 【分类号】TP311.13
- 【被引频次】13
- 【下载频次】533