节点文献

可视化数据挖掘的研究与应用

Research and Implementation of Visual Data Mining

【作者】 王宝杰

【导师】 董立岩;

【作者基本信息】 吉林大学 , 计算机软件与理论, 2007, 硕士

【摘要】 可视化数据挖掘技术是将抽象的信息以一种简明的形式呈现出来,其目的就是要很好利用人们对于可视化形式下模型和结构的获取能力,从而指导挖掘工作,理解挖掘结果。本文设计和开发了数据挖掘系统DBIN Miner的可视化模块,包括数据、过程和结果的可视化,具有良好的易用性和可扩展性,用户可以轻松的观察数据分布及统计信息,对数据挖掘过程进行有效的控制并能通过直观方式查看挖掘结果。本文还设计了星形坐标可视化系统STAR,实现了大规模数据和较大维数的数据集的可视化,可以进行轴的旋转、缩放,图像的缩放,数据点查看、选择及着色等操作,用户通过简单的交换操作,就可以完成交互式数据挖掘。提出使用星形坐标可视化指导用户选择挖掘算法;与DBSCAN聚类算法进行交互式数据挖掘;帮助用户理解数据挖掘算法的结果。

【Abstract】 Data mining is the process of extracting the potential, valuable knowledge from a lot of historical data. Visualization is the process of transferring the data, information and knowledge into visual form. Visual Data Mining technology with data mining technology and information visualization technology develops, it depicts the structure and functional data indicate that human perception and pattern of exceptions, tendencies and the ability to use visualization to enhance data mining, It provides both human and computer information processing system an interface.Effective use of visualization technology, we can quickly and efficiently dealing with a lot of data to find hidden features, patterns and trends, can guide a new and more efficient decision-making. Some data mining techniques and algorithms allow decision-makers to understand and use it. Visualization and data mining results can be more easily understood, and allow for more test results. Also it can be used to guide data mining algorithms to enable users to participate in the process of decision-making analysis. Visual Data Mining has a high value in visual data analysis and data mining techniques to explore large databases, particularly in very little understanding of the data and fuzzy exploring situation.This paper introduces the current situation and the development of the Visual Data Mining. It expounds the contents and significance of the Visual Data Mining.Then it introduces the basic theory and the basic interactive operation of the Star coordinates. It also gives present clustering algorithms a simple summation, which K-means, DBSCAN, BIRCH and CURE clustering algorithm for a more detailed analysis.This paper studies focus mainly on the following aspects:1. Design and Implementation of a data mining system DBIN Miner Visualization module, including visualization, drawing of some basic changes, data acquisition, files of data part.2. Design and Implementation of the STAR that is a star coordinates visualization system, it includes: axis scaling, and rotation axis, graphics scaling and select some point color separation / unclassified data visualization, data observed values, results and other functions. Through the exchange operation, it can help users choose the appropriate clustering algorithm, DBSCAN exchange-visualization and data mining algorithms. Also the system can help users understand the data mining results.In the visual part, we use the Java, Java 2D and 3D rendering technology SWT to draw window, graphics and images. We adopt the data mining industry’s internationally recognized standards and process model, including CRISP-DM and PMML standards, ensure compatible with other Mining tools and other providers products. Visual Data Mining and models are used with PMML standard data formats.Data document module is a document storage system as a whole, it is used to keep the system to handle data format. Data acquisition module is used for local documents obtained from the data show, including the type of text, database and XML document types. Basic drawing module includes drawing axis, Strip Profiles, the plot curves and the classification of the color of the map. Graphic changes module includes the rotation, zoom, partially retractable, and color graphics preservation operation.Standardization of data conversion module will be converted into various forms of data visualization graphics needs. This module provides several parts of the visual interface and operation of the public, which simplifies the visualization algorithms to achieve, while reducing redundancy, to a large extent, to improve scalability and maintainability of the system.This paper describes the design and implementation of a star coordinates system STAR, the functions include: axis scaling, and rotation axis, graphics scaling, and choose some spots be colored, separation or classification of data visualization, data observed values the results and other functions. Through the star coordinates,it can guide users to choose a suitable visual clustering algorithm. We use the UCI data sets, detailed analysis of the K-means, BIRCH, CURE and DBSCAN algorithm and appropriate data sets. This paper proposes the use of visualization solution for the overall parameters Eps DBSCAN algorithm sensitive issues and DBSCAN algorithm found it difficult to solve the interactive operations of the large difference in the density of clusters. Finally, we propose the use of the star coordinates can be effectively coordinate systems to help users understand the data mining results and found that the relationship between the multi-dimensional nature.Visual Data Mining Module is a key component of data mining system, it is dealing with the user interface and data mining system, which determines the appearance of the system, operational, interactive and user-friendly, it is the key to success is a data mining system.Multidimensional data visualization data mining is the key point of visualization. This paper presents the star-dimensional coordinate system which is an effective tool for data visualization. This paper proposes a three star coordinates the use of visual data mining and expands application areas for the star coordinates

  • 【网络出版投稿人】 吉林大学
  • 【网络出版年期】2007年 03期
  • 【分类号】TP311.13
  • 【被引频次】1
  • 【下载频次】478
节点文献中: 

本文链接的文献网络图示:

本文的引文网络