节点文献
CloudBN:基于云计算的概率图模型学习系统
CloudBN:Cloud-Based System for Learning Probabilistic Graphical Model
【Author】 Wang Yuan,Yue Kun,Fang Qiyu,and Liu Wciyi (Department of Computer Science and Engineering,School of Information Science and Engineering,Yunnan University, Kunming 650091)
【机构】 云南大学信息学院计算机科学与工程系;
【摘要】 贝叶斯网(BN)作为一种重要的概率图模型,是统计型不确定性知识表示和推理的基本框架.如何从海量数据中学习BN,是目前云计算环境下海量数据中不确定性知识推理及相关应用的基础和关键.我们利用Hadoop平台,设计并实现了CloudBN这一基于云计算的概率图模型学习系统.CloudBN以BN结构的学习为核心,首先将海量数据存储于HBase中,然后将传统的BN打分搜索学习方法扩展到并行环境,基于MapReduce实现了海量数据中BN的并行学习.介绍了CloudBN的系统结构和相关技术,给出了系统功能和性能的演示.CloudBN充分利用了MapReduce和HBase等云计算支撑技术的海量数据处理能力,实现了海量数据环境的高效BN学习,有效解决了现有BN学习方法不能适用于海量数据的问题.
【Abstract】 As an important probabilistic graphical model,BN is the basic framework for representation and inferring statistical type uncertainty knowledge.How to construct BNs from massive data is the basis and critical part in representation and inferring the uncertainty knowledge in cloud computing and massive data environments.By adopting Hadoop platform,we designed and implemented the CloudBN system for learning probabilistic graphical models based on cloud computing.Centered on structure learning of a BN,CloudBN stores massive data into HBase at first;and then extends the traditional scoring cV- searching BN learning methods to the parallel environment.The system implements the parallel learning of BNs from massive data based on MapReduce.This paper introduces the architecture pf CloudBN and relevant technologies,and then demonstrates the functionalities and performance of the system.CloudBN takes full advantage of the capability for massive data processing capacity of cloud computing that is supported by MapReduce,HBase,etc. The system implements the efficient BN learning for massive data environments to solve the problem that the existing BN learning method cannot apply to the massive data.
【Key words】 massive data; cloud computing; probabilistic graphical model; Bayesian network(BN) learning; MapReduce;
- 【会议录名称】 第29届中国数据库学术会议论文集(B辑)(NDBC2012)
- 【会议名称】第29届中国数据库学术会议(NDBC2012)
- 【会议时间】2012-10-12
- 【会议地点】中国安徽合肥
- 【分类号】TP18
- 【主办单位】中国计算机学会(China Computer Federation)