节点文献
云环境下基于并行支持向量机的高光谱影像分类研究
Research on Hyperspectral Imagery Classification Based on Parallel Support Vector Machine in Cloud Environment
【作者】 黄风华;
【导师】 晏路明;
【作者基本信息】 福建师范大学 , 地图学与地理信息系统, 2014, 博士
【摘要】 高光谱遥感影像具有波段多、数据量大、数据不确定性和监督分类时易受Hughes现象影响等特点,由此对现有的图像信息分析处理技术提出了更高的要求。支持向量机(SVM)是一种基于统计学习理论且已被众多实验所证实的有效学习机制,能较好地解决小样本、非线性、高维数等问题,并已被成功地应用于高光谱分类领域;但对于大规模高光谱影像的分类问题,SVM传统算法(串行)的训练和预测效率低下,而单机和传统分布式环境也难以提供处理海量数据所需的强大并行运算能力和足够的内存空间。有鉴于此,本文引入并行支持向量机(PSVM)和云计算技术,设计出一种基于云计算的并行支持向量机(Cloud-PSVM)分类模型,提出云环境下Cloud-PSVM的增量学习算法和参数的全局优化策略,并将Cloud-PSVM应用于土地利用分类领域,构建基于Hadoop平台的高光谱影像分类云服务。整个研究从计算模式、分类方法和服务模式这三方面入手,旨在保证分类精度的前提下提高高光谱影像分类的效率,推动大规模高光谱影像地物信息提取与机器解译的规模化和智能化。主要研究内容与成果如下:(1)为有效地提高Hyperion高光谱影像的空间分辨率,设计出一种改进型的Gram-Schmidt高光谱影像融合方法,实现了Hyperion高光谱影像与同一遥感平台及同一时相的ALI高空间分辨率影像的高效融合;提出一种基于光谱-地形,以及纹理特征的组合径向基核函数(MRBF),并构建出一种基于MRBF的二叉决策树多类SMO (BDT-SMO)分类器,可有效地提高高光谱融合影像的分类精度。(2)构建Hadoop云储存平台,采用Hadoop分布式文件系统(HDFS)和Hbase数据库实现大规模高光谱融合影像数据和样本数据的分布式存储,通过合理选择分割策略、存取机制和数据组织形式,可有效地提高大规模融合影像和样本数据的存取效率。(3)为有效地提高大规模训练样本的并行学习效率,提出一种基于交叉样本的改进型混合并行支持向量机(YBJCF-PSVM)模型,并与GPU技术相结合,以提高单节点的并行学习能力。此外,设计出一种基于MapReduce和YBJCF-PSVM模式的Cloud-PSVM分类器。(4)将Cloud-PSVM应用于土地利用分类领域。采用MapReduce模式对实验区高光谱融合影像进行并行特征提取,并通过Cloud-PSVM分类器对大规模样本进行并行训练与预测。实验结果表明,Cloud-PSVM分类器能在保证分类精度的前提下较大程度地提高高光谱融合影像的分类效率。此外,为能有效地提高土地利用分类结果的发布效率,还设计并实现了一种基于Hadoop的高光谱融合影像分类的云服务。(5)在云计算环境下设计出一种基于MapReduce和壳向量的SVM增量学习算法(MapReduce-HASVM),可有效地提高Cloud-PSVM分类器的泛化能力和扩展性。此外,还提出一种基于云计算和并行遗传算法(PGA)的Cloud-PSVM参数分布式全局优化策略,可有效地提高Cloud-PSVM分类器的分类精度和核参数的优化效率。
【Abstract】 Hyperspectral remote sensing images have the characteristics of many wavebands, huge data volume, data uncertainty and the Hughes phenomenon effect in supervised classification, so the existing technology of image analysis and information processing must be put forward higher requirements. Support vector machine (SVM) is an effective learning scheme based on statistical learning theory and has been confirmed by numerous experiments. SVM has been successfully applied to hyperspectral classification because it can solve the problems of small samples, nonlinearity and high dimension. But the traditional SVM algorithm (serial) is inefficient in training and predicting for large-scale hyperspectral images. The one-machine and the traditional distributed environment can hardly provide outstanding parallel computing abilities and sufficient memory space for mass-data processing. Therefore, in this paper, the technologies of parallel support vector machine (PSVM) and cloud computing are introduced, the classification model of parallel support vector machine based on cloud computing (Cloud-PSVM) is devised, as well as the incremental learning algorithm and kernel parameters global optimization strategy of Cloud-PSVM are proposed under the cloud computing environment. The Cloud-PSVM is applied to land uses classification, and the classification cloud services of hyperspectral images are built based on Hadoop platform. The computation mode, classification methods, and service model are taken into consideration throughout the research, so as to improve the efficiency of hyperspectral images classification under the premise of guaranteeing classification accuracy, and promote the large-scale and smart extraction and machine interpretation of ground objects information from hyperspectral images. The main work and contributions are summarized as follows:(1) In order to improve the spatial resolution of Hyperion hyperspectral images effectively, an improved Gram-Schmidt fusion method for hyperspectral images is devised to fuse Hyperion hyperspectral images and AL1high spatial resolution images efficiently from the same platform with the same time phase. A combined radial basis kernel function (MRBF) based on the integrated features of spectrum-terrain and texture is proposed, and the binary decision tree multi-class SMO (BDT-SMO) classifier based on MRBF is built, which can improve the accuracy of hyperspectral fusion images classification effectively. (2) A Hadoop cloud storage platform is developed that allows distributed storage of large-scale hyperspectral fused images and sample data through the use of the Hadoop distributed file system (HDFS) and the Hbase database. Large-scale fused images and sampled data can be accessed more efficiently by selecting proper segmentation strategies, access schemes and forms of data organization.(3) In order to improve parallel learning efficiency of large-scale training sample effectively, a improved hybrid parallel support vector machine (YBJCF-PSVM) model based on cross-samples is proposed, and it can be combined with GPU to improve parallel learning ability of single node. In addition, a Cloud-PSVM classifier is devised based on Map Reduce and YBJCF-PSVM model.(4) Cloud-PSVM is used in the classification of land uses. The MapReduce model is adopted to extract features of hyperspectral fused images from experimental zones in a parallel manner, and large-scale samples can be trained and predicted parallelly by Cloud-PSVM classifier. Experimental results show that Cloud-PSVM classifier can considerably improve the classification efficiency of hyperspectral fused images under the premise of guaranteeing classification accuracy. In addition, in order to improve the release efficiency of the results of land use classification effectively, the cloud services of hyperspectral fusion images classification are also devised and implemented based on Hadoop.(5) A incremental learning algorithm of SVM is devised based on MapReduce and hull vectors (MapReduce-HASVM) under cloud computing environment which can improve the generalization and scalability of the Cloud-PSVM classifier effectively. In addition, a distributed global parameters optimization strategy of Cloud-PSVM based on cloud computing and parallel genetic algorithm (PGA) is proposed, and it can effectively improve the classification accuracy and kernel parameters optimization efficiency of Cloud-PSVM classifier.