节点文献

基于Web日志的定性规则挖掘与分析

Qualitative Rules Mining and Analysis Based on Web Log

【作者】 李海林

【导师】 柳炳祥;

【作者基本信息】 景德镇陶瓷学院 , 计算机应用, 2009, 硕士

【摘要】 Web日志挖掘是通过挖掘Web日志记录来发现用户访问Web页面的模式。通过分析和探究Web日志记录中的规律,可以获取非常有价值的潜在信息。然而,非结构化或半结构化的数据给Web日志挖掘带来了很大的麻烦,对Web日志记录数据的处理成为研究Web日志挖掘的前提,同时人们在理解挖掘Web访问信息的程度也各有不同,即人们对Web日志挖掘出来的信息知识的理解方向或深度不相同,表明传统Web日志挖掘的结果不能够很好地解决概念不确定性的问题。因此,研究Web用户访问信息的不确定性具有非常重要的现实意义。本文首先对Web日志挖掘做了简单的介绍,并对当前国内外的相关研究进行了分析与总结。然后在此基础上结合模糊理论及云理论,对Web日志数据预处理及Web日志定性规则的提取进行了研究,主要研究成果为:1.提出了一种基于用户访问效率的Web日志预处理方法。该方法使用了影响用户访问效率的两个参数(兴趣度与熟悉度),研究了用户访问模式与这两个参数的关系,解决了用户会话过程中可能发生主题变化所带来的问题,对用户会话实现了稳定兴趣与偶然兴趣的划分,为继续进行Web日志挖掘提供了有效的数据来源。在此基础上,又提出了一种基于模糊分析和用户兴趣度的Web日志规则挖掘方法,以便改善现有关联规则在Web日志挖掘中应用的不足,进而提高规则提取的效率。2.提出一种基于云理论的Web日志定性规则提取方法。该方法分析了影响用户兴趣度的时间因素,利用云模型表示关联规则挖掘中支持度和置信度的“软阈值”,采用云变换实现各页面停留时间定性概念的划分,克服了边界过硬的问题。与传统方式相比,该方法挖掘出的规则是一种基于时间概念的多条件多规则的定性描述形式,更能够灵活地反映Web用户访问模式的规律性。软件程序分析结果表明,基于云理论和用户停留时间的Web日志定性规则提取方法不仅能获取用户访问页面的关联规则,还能反映用户对某个页面的兴趣程度。与此同时,还讨论了基于云模型的网页吸引度的定性分析方法,为研究Web页面的吸引度提供了新的途径。3.通过对Web模糊聚类的分析与研究,提出了一种基于模糊聚类和云理论的Web信息定性分类算法。算法的主要优点是摆脱了以往寻找合适阈值λ的麻烦,并且根据数据的分布规律能对Web信息进行巧妙地分类。最后,在稳定时间复杂度的前提下,为了克服由提取定性规则所带来了空间复杂度问题,提出了一种数据结构化设计的数据存储方法。利用高级数据结构中的邻接表数据存储结构,方便地解决了稀疏矩阵或稀疏表的数据压缩问题,进而保证程序设计过程中内存空间的合理应用。最后,对所做的工作进行了归纳与总结,探讨了将来进一步的研究方向。

【Abstract】 Web log mining is to discovery the web patterns of web pages from the web log records. We can analyse and research the regulations in the web log records, and then get the valuable latency information. However, the non-structural and semi-structured data brings larger troubles in the web log mining, so the process of web log data is a precondition of web log mining, what’s more, different people has his own notions to the extent of visited web information which is mined by the relative technology. All those conditions show that the tradition results of web log mining is not good enough to be understood and to express the uncertainty problems. Thereby, the uncertainty of researching on web information visited by the users has great importance and practical meanings.Above all, in this thesis I introduce the conceptions and principles of web log mining simply, analyses the relevant researches from the overseas and domestic views, and conclude them. Finally, I preprocess the data and use the fuzzy theory and cloud theory to study and extract the qualitative regulations of web log. The main research fruits are following.One, given out a preprocess method of web log data based on the web user’s interest with the visited web. The method uses the two parameters to illustrate the affection of the visitation and study to find out the relatives between the two parameters, finally had dealt with the problems which may be occurred in the process of customer conversation and had divided the customer conversations into the respective interest according to the stabilization and contingency. Those jobs provide the valid data source for the web log mining. And showing the mining way of web log regulations based on the both of fuzzy analysis and users’ interests in order to improve the deviancy of the current associate regulation in the application of web log mining and finally advance the efficiency of distilling regulation.Secondly, Given out the new method of mining web log rules introduced after studying the model of the web user actions written on web log record. This new way analyzes the influent factor on users’ interest and uses the cloud model to define the support degree and the confidence degree. The time which is used in read content by web users is divided into and transformed into different qualitative conceptions. To compare with traditional ways, this new way can provide the qualitative expressions which consist of qualifications and rules, and it is more neatly to reflect the visiting rules. The result of analysis shows that this method not only creates the associate regulation of the web visited webs and also response the extent of interest of user to some web. At the same time, I also discuss the way of the Qualitative analysis method of web magnetism based on cloud theories, which provide us the new road to research the magnetism of web page for users.Thirdly, with the research and analysis of the web fuzzy clustering, a new arithmetic of web information qualitative classification based on both of the fuzzy clustering and cloud theories is illustrated in this paper. Comparing with the traditional method, it can get rid of the trouble of the search of the suitable threshold value and can classify the web information perfectly according to the distribution of the data. Finally, to overcome problem of large space for distilling the qualitative regulation under the invariability of time complexity, I give out a new method of data storage with the data structure design. Many data condensation problems of sparse table are solved out by the adjacency table and prove the reasonable application of the space in the process of the program design.Finally, induce and conclude what I have done during the time of postgraduate and probe into the directions of the future on the web log mining.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络