节点文献
基于主题模型的高空间分辨率遥感影像分类研究
Topic Modle on Image Classification and Their Applications in High Spatial Resolution Remote Sensing Image
【作者】 徐盛;
【作者基本信息】 上海交通大学 , 模式识别与智能系统, 2012, 博士
【摘要】 随着传感器技术的飞速发展,获取的遥感影像的空间分辨率逐步提高,遥感影像的数量日益增多,如何对大量的高空间分辨率遥感影像进行特征提取和分类研究成为当前遥感影像研究的基本和核心问题。对于高空间分辨率遥感影像,面向对象的研究已经逐步替代基于像素的技术来提取更多的高空间分辨率影像细节信息。当前常用的分类方法是直接从影像对象中提取低层视觉特征进行分类。但是,这需要从根本上解决一个问题,即低层视觉特征与高层语义特征间的“语义鸿沟”。由于语义表达在图像理解与处理中的重要性,本论文沿着视觉词包模型,主题建模到遥感影像对象分类的路线开展相关研究,将遥感影像对象的低层视觉特征进行词包表示,并进一步映射为主题这一中层语义表示。本文分两步来实现遥感影像处理与分析中的影像对象表达和分类。第一,论文通过词包模型来组织影像对象中的局部特征,以此生成具有一定分辨能力的视觉单词,实现对象的词包表示。第二,论文通过主题模型分析不同影像对象中单词的分布(如:视觉单词的共生性和相关性),由此发现对象中隐含的主题(即中层语义),进一步以主题来描述对象,实现对象分类。论文的主要研究内容和创新点包括以下几点:1.提出了一种基于多尺度的视觉单词。本文在高空间遥感影像中引入词包模型来计算影像对象的局部特征块表示。为了减少视觉单词的歧义性,引入“虚拟”视觉单词的概念,并结合多尺度图像金字塔,构造多尺度的视觉单词。一方面,“虚拟”单词的应用避免了模型将图像块量化为不合适的视觉单词,而是强制映射为“虚拟”单词,避免图像块量化的歧义性。另一方面,针对视觉单词存在的多义性现象,本文根据不同尺度下图像内容的差异,利用基于多尺度图像金字塔的视觉单词,不仅扩展了单尺度视觉单词的表示能力,还论证了小尺度中提取的视觉单词能够隐含大尺度图像中邻近图像块之间的相互关系,从而减少了视觉单词出现多义性的可能性。实验表明,本文采用多尺度词包表示作为遥感影像的特征,其分类性能优于基于低层视觉特征和经典词包表示的分类性能。2.研究并提出了基于多尺度词包表示的层次主题模型,用无监督方法来组织遥感影像对象的层次关系。在经典的层次主题模型中引入视觉单词的尺度信息作为模型构造时的约束条件,将小尺度提取的视觉单词作为对象的概述信息,而大尺度提取的视觉单词作为对象的精细信息,模拟了图像理解过程中由抽象到具体的规律,更好地构造影像对象的层次结构。通过对两组不同遥感传感器采集的影像对象集构造层次聚类,实验表明:基于多尺度词包表示的层次主题模型不仅能提高聚类的效果,而且构造的层次结构与人的认知结果相一致。3.提出了基于主题模型的半监督学习算法。本文统一了监督与无监督主题模型的生成过程,并利用多条件学习的思想加权连接这两个模型,使得模型能从少量的标签样本和大量的无标签样本中估计参数,实现分类。该模型将遥感影像对象的主题构造与半监督学习的思想相结合,提高了遥感影像对象的分类性能,同时为主题模型的应用提供了一种新的思路。
【Abstract】 The development of sensor technology makes us obtain more and more high spatial resolution remote sensing images. But one of the most important problems that face us is how to extract low-level features and classify huge amount of remote sensing images. Recently, object-based method has been widely used in remote image processing and analysis instead of pixel-based method. Then the low-level visual feature is directly used for object classification. However, one challenge is how to bridge the semantic gap between low-level visual features and high-level semantic features when using object-based method. In order to apply the semantic content for better classification performance, the dissertation follows the routine of“bag-of-word (BOW), topic model and object classification”, where we research the image objects from the low-level visual feature to BOW representation to middle-level semantic topics.Two main steps are used to achieve visual semantic representation and image classification. Firstly, the work organizes the local feature to obtain visual words and construct the BOW representation. Secondly, the work uses topic model to reveal underlying topics, which are used to classify images as middle-level semantic information. The concrete contributions are listed as follows:1. The multi-scale visual vocabulary is proposed. The bag-of-word method is introduced to high resolution remote sensing image. To reduce visual word ambiguity, the“virtual word”and multi-scale words are proposed. The“virtual word”is generated to expand visual vocabulary, where all uncertain image features would be mapped to virtual word. And the multi-scale visual words based on image scale pyramid are used not only to enrich representative ability of visual words, but also to associate image regions within a local region. Experimental results show that the classification performance with multi-scale BOW representation is better than the classification performance with low-level visual feature and traditional BOW representation.2. The hierarchical topic model based on multi-scale BOW representation is proposed to organize the object hierarchy. The scale information in multi-scale BOW is added into hierarchical Latent Dirichlet Allocation model as constraints in model initialization. Then, the proposed model assumes that visual words at coarse scale outline the image contents, and visual words at fine scale describe the detailed contents, which simulates the coarse-to-fine image understanding process. Comparison to the traditional topic models and SVM classifier, our model obtains the higher classification performance and even constructs the hierarchy consistent to the results by human recognition.3. The semi-supervised topic model is proposed to learn both the labeled and unlabeled training data. The proposed model unifies the supervised and unsupervised LDA models into the same generative process, and weights the contribution of these two topic models to create a semi-supervised LDA model. Thus, the semi-supervised model not only improves the classification performance with few labeled samples and many unlabeled samples, but also extends a new application.