节点文献

分布式应用软件的日志分析系统的设计与实现

The Design and Implementation of Log Analysis System for Distributed Application Software

【作者】 李娟

【导师】 王世杰; 王祥熙;

【作者基本信息】 东南大学 , 工程硕士(专业学位), 2018, 硕士

【摘要】 随着互联网数据规模的急剧膨胀,以及服务类型的复杂多样,各类企业的运营业务逐渐由单机的系统转为分布式应用软件系统来承载。与此同时,系统运维的复杂度也呈指数级上升,大量分散的日志不易管理,人工检索日志中的关键异常信息效率太低,并且遗漏的可能性较大,导致大量有价值的日志信息没有被充分利用。本文正是针对当前分布式应用场景下的运维痛点,设计并实现了一个适用于分布式应用软件的日志分析系统。从分散日志的统一收集、存储和日志分析以及分析结果的可视化展示,将日志分析系统分为日志聚合和日志分析两大模块。日志聚合模块包括日志收集、消息队列、分布式存储三个部分。日志收集使用了开源组件Flume,并结合实际场景对其中的Channel组件进行了扩展,开发了自定义的通道DoubChannel,实现了内存通道和文件通道的自由切换。日志分布式存储使用了基于Lucene开发的分布式搜索引擎Elasticsearch,为日志分析模块提供更好的数据服务。在日志收集和存储间使用了Kafka消息队列缓存数据,避免出现由于数据洪峰以及两端处理速度不匹配问题造成的性能问题。日志分析模块包含在线任务管理、日志聚类分析、相关性分析和异常场景分析四个部分。在线任务管理负责整个日志分析模块的启停管理。日志聚类分析从Kafka中获取实时的日志,使用IPLoM和DBSCAN两种算法相结合提取出日志模板,生成日志模板库,并实时更新。相关性分析首先从Elasticsearch中获取指定类型的日志和日志模板库进行匹配生成日志分布基线,其次使用分箱算法、分位数算法和KSigma模型根据模板库和分布基线分析实时的日志数据,得到实时的窗口数据。异常场景分析通过分位数算法和LCS算法对连续的异常窗口进行异常标记识别和故障判别。系统的可视化部分对日志分析统计结果和详细信息给出了清晰的展示。经过功能测试和性能测试表明,系统能够快速发现异常并进行故障判别、给出故障根因和应急预案,能够帮助企业实现便捷的运维工作。

【Abstract】 With the rapid expansion of the data scale of Internet,and the complex diversification of the types of service,the service of enterprises is gradually transferred from a stand-alone software system to a distributed software system.At the same time,the complexity of system maintenance increases exponentially.A large number of scattered logs are not easy to manage,the efficiency of manual retrieval of key abnormal information in the log is too low.And useful information may be missed,a large number of valuable log information is not fully utilized.A log analysis system for distributed application software is designed and implemented to solve operation difficulties in distributed application scenarios in this thesis.The log analysis system is divided into two modules: log aggregation and log analysis.The log aggregation module consists of three parts: log collection,message queue,and distributed storage.Log collection uses the open source component Flume and develops a custom channel DoubChannel according to the real application scenario.DoubChannel can make data free switch between memory channel and file channel.Kafka message queue is used to cache data in log acquisition and storage to avoid performance problems.Log distributed storage uses the distributed search engine Elasticsearch to provide data for log analysis module.The log analysis module consists of four parts: online task management,log clustering analysis,correlation analysis and exception scenario analysis.Online task management is responsible for the opening and closing management of the whole module.Log clustering analysis gets log from Kafka,generates log template library by IPLoM and DBSCAN.Correlation analysis firstly gets the log and log template from Elasticsearch and database to generate the log distribution baseline.Secondly,real-time log data are analyzed by using boxsplitting algorithm,quantile algorithm and KSigma model according to template library and distributed baseline to get real-time window data.Anomaly scene analysis identifies anomaly markers and identifies faults in successive anomaly windows by Quantile algorithm and LCS algorithm.The visualization part of the system gives a clear display of log analysis results and detailed information.Functional testing and performance testing show that the system can quickly detect anomalies and identify faults,give fault causes and emergency plans.The system can help enterprises make operation and maintenance more convenient.

  • 【网络出版投稿人】 东南大学
  • 【网络出版年期】2020年 03期
  • 【分类号】TP311.52
  • 【被引频次】4
  • 【下载频次】88
节点文献中: 

本文链接的文献网络图示:

本文的引文网络