节点文献
基于粒度计算的知识发现研究及其应用
Granular Computing Based Knowledge Discovery and Its Applications
【作者】 刘勇;
【导师】 潘云鹤;
【作者基本信息】 浙江大学 , 计算机应用, 2006, 博士
【摘要】 知识发现是人工智能领域的研究热点,目前已经得到了很大的发展。然而,当前的知识发现还存在诸多重要的待解决的研究问题,如知识的表达方式过于单一、缺乏有效的复杂数据和复杂场景下的知识抽取方法(典型的场景如增量式数据情况)、缺少能够保持数据特征内在语义联系的高维数据降维方法,以及如何对发现后的知识进行有效性验证等等。 本文针对知识发现过程中的几个关键环节:知识的表达/描述方式,知识的抽取,知识降维,以及获取知识的有效性验证等,引入了粒度计算的原理和方法,对上述环节中存在的问题展开较深入的研究和探索: (1) 系统化提出了粒度计算三大原理:粒度知识表示原理、粒度近似求解原理、粒度问题映射原理。 (2) 采用了粒度划分的知识表达形式。通过引入粒度知识表示原理,将粒度的划分作为知识表达的一种形式,将粒度的不同粗细划分作为一种知识,从而提出了一种新的知识表达方式,通过对处理对象进行粒度划分来表达不同的知识。 (3) 提出了支持非一致数据的知识抽取算法。根据粗糙集理论设计实现了一个支持非一致数据的知识抽取算法,同时提出了一个适用于复杂情况的增量式知识抽取算法,并根据粒度近似求解原理提出了可并行/串行的近似规则抽取算法。 (4) 给出了知识抽取中的特征选取和约简方法。从表结构的数据特征所能容纳的数据记录问题出发,提出了数据饱和度概念,并根据数据饱和度的特性,提出了一个综合了属性约简和属性选取两者优点的数据降维方法。 (5) 应用知识发现方法来解决复杂问题,给出验证知识有效性的实例。即利用综合粒度方法中的三个原理解决古代建筑建模系统中的知识辨识过程问题。
【Abstract】 Knowledge discovey is a hot research field in artificial intelligent. However, there are still many un-resolved problems in this field, such as the meagar of the knowledge presentation, knowledge extraction under complex data and data environments (for example, the incremental data environment), reduct the features of high dimensional data and evalution for the knowledge after knowledge discovery etc.This dissertation addresses on several steps, which are knowledge representation/description, knowledge extraction, reduct dimension of the knowledge raw data and the evualtion for the knowledge. It introduces theory of granular computing and improves the drawbacks containing the previous steps. Including the following detail:(1) We conclude three basic principles in granular computing; they are granular knowledge representation, granular approximate problem resovling, and granular problem mapping.(2) The classification under different granula is treated as the knowledge representation. The different scale of the granula can be reflected by the classification of raw data, which is also knowledge in real world.(3) We present the knowledge extracting algorithm which supporting inconsistent data The algorithm is based on the rough set theory in granular computing, and it can also be improved as an algorithm for incremental data set. After applying the principle of granular approximate problem resovling, this algorithm can also be expanded to a parallel/serialized approximate knowledge extracting algorithm.(4) We present a hybrid feature ruduct and feature selection algorithm. It defines a measure named data inconsistent ratio from the intuition of the specific table containing the stable number of equivalent classes.(5) We also use the practice problem to evulate the knowledge discovery methods. A real problem for distinguishing the domain knowledge of the ancient Chinese architecture modeling system has been proposed
【Key words】 Knowledge discovery; granular computing; rough set; feature selection; saturation; architecture modeling;