节点文献
不完备信息系统中数据挖掘的粗糙集方法
Rough Set Approach to Data Mining in Incomplete Information Systems
【作者】 梁美莲;
【导师】 梁家荣;
【作者基本信息】 广西大学 , 计算机应用技术, 2005, 硕士
【摘要】 数据挖掘中面临大量的不完备信息系统,即可能存在部分对象的一些属性值未知的情况。不完备的数据可能使挖掘过程陷入混乱,导致不可靠的输出。其所表现出来的不确定性也更加显著,这将大大增加数据挖掘的难度。该文以一种处理不精确、不确定和模糊知识的数学方法——粗糙集理论为主要工具,逐步深入展开对不完备信息系统下数据挖掘的研究,以期减少数据挖掘研究与实际应用之间的差距。 本文首先详细探讨了与数据不完备性相关的重要问题,并对各种处理属性缺失值的数据挖掘技术进行了归纳总结和分析比较。接着通过对粗糙集理论的研究,阐明了粗糙集理论是一种尤为适用于不确定、不完备系统的数据挖掘的数学工具。其中重点探讨了现有的几个不完备信息系统的粗糙集模型,并对比分析了它们的优缺点。在此基础上,提出了基于容差关系的不完备信息系统中最小决策规则集的提取算法,并通过理论分析、实例和实验说明了该算法的有效性。另外,提出了基于τ限制容差关系的不完备信息系统粗糙集模型及其知识约简方法。最后提出了一个基于该数学模型的不完备信息下的数据挖掘系统模型。
【Abstract】 Missing or incomplete data are a major concern in data mining both because a substantial proportion of the data may be missing in real-world applications and because poor methods for incomplete data will bias the results of data mining. In addition, it is of great difficulty for data mining in an incomplete information system, which contains more uncertainty than a complete one does. This paper applies rough set theory -a mathematical tool for dealing with inexact, uncertain or vague knowledge-to handling incomplete data in data mining, so as to reduce the large gap between the available data and the machinery available to process the data.In the paper, the main issues related to the incomplete data problem are detailed first. And the commonly-used methods of handling incomplete data problems in data mining are reviewed, with a discussion about a number of their known strength and weakness. Then the theory of rough set is introduced. Several extensions ofrough set in incomplete information system are carefully studied and the performance of these extended models are compared, based on which an algorithm of optimal decision rules generation is presented and proved, and a new extension of rough set based on the r limited tolerance relation and knowledge reduction methods in it are proposed. Finally a model of a data mining system under incomplete information is given.
【Key words】 data mining; rough set theory; incomplete information system; limited tolerance relation; missing data imputation;
- 【网络出版投稿人】 广西大学 【网络出版年期】2005年 05期
- 【分类号】TP311.13
- 【被引频次】31
- 【下载频次】600