节点文献
大规模LTE信令数据处理系统研究
Research on Large Scale LTE Signaling Data Processing System
【作者】 李晓东;
【导师】 申彦明;
【作者基本信息】 大连理工大学 , 计算机应用技术, 2017, 硕士
【摘要】 近年来,移动数据业务和智能设备数量快速增长,移动设备的数据流量不断增加,这对数据传输速度和稳定性提出了更高的要求。LTE网络为用户提供了更高的数据传输能力,因此,LTE网络规模得以迅速扩大。随之而来的就是LTE信令数据量的快速增长。这给信令数据的存储、处理和分析带来了巨大的挑战。传统的纵向扩展架构难以满足海量信令数据快速处理和分析的需求,而大数据技术的发展使得通过廉价的PC构建分布式计算系统成为一种处理海量数据的新方式。这种方式通过横向扩展的方式增加系统的处理能力,这为海量LTE信令数据的处理提供了一种新的思路。首先,本文针对LTE信令数据海量和实时性的特点,设计满足海量LTE信令数据收集、汇总、处理和存储的系统。系统可对多种格式的LTE信令数据进行收集,并针对信令数据的结构提出适当的数据分发汇总策略,使用Kafka对数据进行汇总。Spark作为系统的计算模块,对数据进行实时处理。系统根据信令数据快速存储和查询的需求,设计存储模型,利用HBase对信令数据进行存储。其次,本文分析了当前系统实时处理模块存在的不足。当前的系统在数据实时处理时,从Kafka中读取数据后需要进行重新分片以使得分片数量满足计算集群处理的并行要求。而重新分片需要对全局所有数据进行转移,从而带来了大量的网络带宽消耗和磁盘IO消耗。针对重新分片的消耗,本文提出了读取数据时预分片的策略,在数据汇总时根据生成LTE信令数据的设备ID将数据均匀的分配到多个Kafka分片中,在数据读取时计算得到Kafka数据源中每条数据映射到的计算集群中的分片ID,并按照此ID对数据进行预分片。通过这种方式使得数据读取后在计算集群中的分片可为指定的数量,避免了重新分片带来的消耗。之后,本文针对在LTE信令数据实时处理过程中输入数据数量和计算集群的计算能力不匹配的问题,提出了一种改进的数据处理速率上限控制算法。该方法通过收集数据实时处理过程中各阶段的统计信息,实时计算得到当前任务处理的速率上限,并根据该速率上限对任务进行提交。通过这种方法,使得计算集群读取的数据量与集群的实时处理性能相匹配,使得数据能够被及时处理完成。最后,本文对信令数据处理系统的可行性进行了验证,并在本系统上对预分片策略以及数据处理速率上限控制算法进行了大量的对比实验。实验结果表明,预分片策略能够显著减少LTE信令数据在实时读取和分片处理阶段的耗时。在应对速率恒定和速率变化的数据流时,数据处理速率控制算法能够降低任务的调度延迟,提升系统的稳定性。
【Abstract】 In recent years,the number of mobile data services and smart devices has grown rapidly,and the data traffic of mobile devices has been increasing,which poses a higher demand for the stability and speed of data transmission.LTE network can provide users with higher data transmission capacity.Therefore,the scale of LTE network can be rapidly expanded.So the amount of LTE signaling data also grows rapidly.This presents a huge challenge to the storage,processing and analysis of signaling data.The traditional scale up architecture is difficult to meet the requirements of rapid processing and analysis of massive signaling data.The development of big data technology makes it possible to build a distributed computing system through cheap PC,which is a new way to deal with massive data.This approach increases the processing power of the system by scaling out,which provides a new way for the processing of massive LTE signaling data.First,this thesis considers the massive real-time characteristics of LTE signaling data,and designs a system that satisfies collection,aggregation,processing and storage of massive LTE signaling data.The system can collect LTE signaling data in a variety of formats,and put forward the appropriate data distribution and summarization strategy according to the structure of signaling data and use Kafka to summarize the data.Spark is the calculation module,which is utilized to process the data in real-time.The system designs the storage model according to the requirement of fast storage and querying of signaling data,and uses HBase to store data.Then,in this thesis,we analyze the shortcomings of the real-time processing module in the system.When in real-time processing,the system needs to reshuffle the data after read from Kafka,so that the number of partitions can satisfy the parallel requirements of the computational cluster processing.Repartition needs to shuffle all data over the network,which will result in a large amount of network bandwidth consumption and disk IO consumption.For the consumption of repartition,this thesis proposes a Pre-partition strategy for reading data.LTE signaling data is distributed evenly among multiple Kafka partitions according to the device ID of data when the data is aggregated.At the time of data reading,Pre-partition strategy calculates the partition ID in the computational cluster to which each of the data in the Kafka data source is mapped,and Pre-partition data according to this ID.In this method,the number of partitions can be specified after the data is read to the computational cluster,which can avoid the overhead of repartition.Then in this thesis,an improved data processing rate limit control algorithm is proposed to solve the problem that the number of input data and the computing ability of the cluster is not matched in real-time processing of LTE signaling data.The algorithm calculates the rate limit of the current task processing in real-time by collecting the statistical information of each stage in the real-time processing of the data,and submits the task according to the rate limit.Through this algorithm,the amount of data read by the cluster is matched with the real-time processing performance of the cluster,so that the data can be processed in time.Finally,this thesis verifies the feasibility of the LTE signaling data processing system,and performs experiments to the Pre-partition strategy and rate limit control algorithm in this system.The results demonstrate that the Pre-partition strategy can significantly reduce the processing time of LTE signaling data in real-time read,partition and processing.When dealing with constant data streams and rate-varying data streams,the improved rate control algorithm can reduce the scheduling delay of the task and improve the stability of the system.
【Key words】 LTE Signaling Data; Big Data process; Data Partition; Rate Control;