节点文献

基于Spark的MODIS海表温度快速反演方法研究

MODIS SST Fast Retrieval Method Based on Apache Spark

【作者】 刘欢

【导师】 陈能成;

【作者基本信息】 武汉大学 , 地图学与地理信息系统, 2019, 硕士

【摘要】 随着对地观测遥感技术的不断发展成熟,遥感产品分析应用的不断推广深入,遥感影像处理正向着应用场景广泛、影像产品多样、光谱分辨率高、空间分辨率高和时间分辨率高的方向迅猛发展。海量的遥感影像数据现已成为了一种典型的大数据形式,除具有大数据的基本特征外,还具有其特殊的内外部特征。本文为研究遥感大数据的处理方法,以MODIS海表温度反演算法为例,讨论算法改进、内存数据模型和工作流程处理模型设计过程,提出了一种基于Apache Spark通用并行计算框架的遥感大数据处理方法。本文研究工作主要包含以下三个方面:1.适于快速计算的海表温度反演算法:适于快速计算的海表温度反演算法通过比较不同算法特点,结合快速计算需求,以拟合算法参数和精简算法表达的方式,构建参数同步性和完整性强、输入中断和外部依赖少等特点的快速计算算法。2.基于内存模型优化的遥感大数据处理流:基于内存模型优化的遥感大数据处理流通过比较遥感大数据与文本大数据的异同,提出了一种基于弹性分布式数据集的遥感影像内存模型,并基于集群计算特点优化并行计算过程。3.面向IO优化的数据处理工作流程:面向IO优化的数据处理工作流程以提升算法全流程执行效率为目的,结合不同的工作流程模式,串联离散的遥感影像处理步骤,并针对步骤间和步骤内优化数据访问效率。本文按照以上改进算法、数据模型和工作模式,设计并实现了基于Spark的MODIS海表温度快速反演方法。该方法能够大幅度提升MODIS海表温度反演算法的各阶段执行时间效率,有效节省算法执行过程中的内存和磁盘空间消耗。在不同运行模式中,该方法具有13.43倍于单机程序的时间效能。在不同的数据负载压力下,该方法能够较稳定的维持高性能处理能力。研究结果表明,基于Spark的MODIS海表温度快速反演方法通过改进已有算法、构建内存数据模型和流化离散处理步骤等方式有效提升了MODIS海表温度反演算法的执行效率,为遥感大数据处理提供了借鉴参考。

【Abstract】 With the remote sensing technology for earth observation continues to develop and mature,and the promoting and applying of its products and algorithms,remote sensing image processing is progressing rapidly towards the direction of extensive application scenarios,diverse image products,high spectral resolution,high spatial resolution and high temporal resolution.Now,remote sensing image data has become a typical form of big data,which not only has the basic characteristics of big data,but also has special internal and external features.In order to study the processing method of remote sensing big data,this paper takes MODIS sea surface temperature retrieval algorithm as an example,discusses the improvement of algorithm,in-memory data model and workflow processing model,and explores the remote sensing big data processing method,based on an universal parallel computing framework,Apache Spark.The research work in this paper mainly starts from the following aspects: SST retrieval algorithm that is suitable for fast calculation,remote sensing big data process which is based on in-memory data model and data processing workflow that is oriented to IO optimization.First,by comparing different algorithms and taking the needs of fast calculation into account,the SST retrieval algorithm is constructed with strong parameter synchronization and integrity,less input interruption and external dependences,in the form of fitting algorithm parameters and expansion of the algorithm expressions.Second,by comparing the differences between remote sensing big data and text big data,an in-memory model for remote sensing image based on resilient distributed dataset was proposed,and the parallel computing optimization was carried out based on cluster computing technology.Third,IO optimization-oriented data processing workflow aims at improving the execution efficiency of the whole process of the algorithm.By combining different workflow patterns,discrete remote sensing image processing steps are connected,and data access efficiency is optimized between and within steps.In accordance with the above algorithm expression,data model and workflow patterns,this paper designed and implemented the MODIS SST fast retrieval method based on Apache Spark.This method greatly improves the time efficiency of MODIS SST retrieval algorithm,and effectively saves the memory and disk space at the runtime of the algorithm.Under different execution modes,the time efficiency of this method is 13.43 times higher than that of the local one.Under different data load,this method can be stable as load adding.The research results show that the workflow method based on Spark,can effectively improve the execution efficiency of the MODIS SST retrieval algorithm,by improving algorithm,constructing inmemory data model and integrating discrete processing steps,provides a reference for remote sensing big data processing.

【关键词】 Apache Spark工作流程海表温度MODIS
【Key words】 Apache SparkWorkflowSSTMODIS
  • 【网络出版投稿人】 武汉大学
  • 【网络出版年期】2020年 06期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络