节点文献

Internet服务故障管理

Fault Management for Internet Services

【作者】 黄晓慧

【导师】 程时端;

【作者基本信息】 北京邮电大学 , 计算机应用技术, 2006, 博士

【摘要】 随着Internet逐渐向面向服务体系架构(SOA)演化,服务提供商(SP)开始发现Internet服务能带来潜在的高额利润,因此,近年来出现了各种各样的Internet服务,如IP电话、IP电视、视频点播、网络游戏、VPN等。为了维持已有顾客群并吸引更多新用户,SP需要保证所提供服务的服务质量(QoS)。 故障管理对于服务QoS保证非常重要,服务的不可用或性能降级都会引起SLA(Service Level Agreement,服务等级协定)违约,影响SP的信誉,并造成经济损失。因此,SP需要面向服务的故障管理机制,在服务发生故障时能迅速定位故障并采取相应措施,以缩短服务宕机时间和性能降级时间。本文围绕着Internet服务的故障管理体系及其相关算法进行了研究,研究的侧重点在于基于图论技术的服务故障诊断,服务故障传播模型建立,故障定位算法的分析和改进,跨多个自治域的服务故障管理框架、以及基于资源规划的服务性能降级故障处理。论文的主要工作包含以下几个方面: (1) 分析造成服务故障的复杂原因,比较了现有的故障诊断技术,选择图论技术作为本文研究的理论工具,采用二分图作为服务的故障传播模型,提出了分层的故障传播模型,并提出了建模方法。分层模型通过将故障管理任务划分到多个独立的层次,简化了故障诊断过程。 (2) 将二分图故障传播模型的故障诊断问题转化为集合覆盖问题,借鉴启发式贪婪算法,设计了最大覆盖算法MCA和MCA+。MCA和MCA+算法是基于时间窗口的,其中MCA+算法是MCA算法的扩展版本,综合考虑了丢失和虚假症状对故障诊断造成的影响。仿真结果证明了MCA和MCA+算法能够获得比现有算法更高的故障检测率和更低的误判率,并且算法稳定,计算复杂度较低。 (3) 为了改进基于时间窗口算法在时间窗口设置不合适的情况下的性能,提出了多时间窗口故障诊断算法MFD。基于时间窗口的算法存在着固有缺陷,即算法的准确度依赖于时间窗口的设置值。MFD算法在MCA+算法的基础上增加了相邻时间窗口关联关系分析,能在一定程度上克服时间窗口设置不准确对算法造成的影响。仿真结果表明,MFD在时间窗口设置准确的情况下能获得与MCA+类似的性能;而在时间窗口设置不准确(包括大窗口和小窗口)的情况下,MFD算法能获得比MCA+算法更高的故障检测率和更低的误判率,并且其算法复杂度与MCA+相同。 (4) 分析了在多域异构网络环境中服务故障管理存在的问题,提出了基于PDB

【Abstract】 As Internet migrates gradually to SOA (Service Oriented Architecture), Service Providers (SP) find out that Internet services have the potential to bring great profits. Thus various Internet services appeared in recent decades, such as VoIP, IPTV, VoD, Internet Game, VPN and etc. In order to maintain traditional customers and attract new users, SP should provide QoS (Quality of Service) for their services.Fault management is crucial for QoS guarantee, since service unavailability or performance degradation may cause Service Level Agreement (SLA) violation, which will affect the SP’s credit and cause the financial loss. Therefore, in order to shorten the unavailable and performance degradation period when a service failure happens, SP needs service oriented fault management to localize the fault and take counter measures. This dissertation focuses on the fault management architecture for Internet services and the related algorithms. Graph theory based fault diagnosis, FPM (fault propagation model) modeling, fault localization algorithms analysis and improvement, and the multi-domain fault management framework for Internet services are examined in this dissertation. The main contributions are as follows:(1) The complicated causes of Internet services failure are analyzed. Based on the comparison of current fault diagnosis techniques, graph theory is chosen to be the theoretical tool and bipartite graph is employed as the FPM for Internet services. A layering FP.M with the modeling approach is proposed. By dividing the fault management task into separated layers, the layering model can simplify the fault diagnosis process.(2) By transforming the fault diagnosis in bipartite FPM into a set-covering problem, two window-based fault localization algorithms MCA (Max-Covering Algorithm) and MCA+ are proposed, based on the heuristic greedy algorithm. MCA+ is an extension of MCA, taking the loss and spurious symptoms into account. Simulation results prove that MCA and MCA+ can achieve higher fault detection rate and lower false positive rate than current algorithms. They are more stable and have lower complexity in addition.(3) MFD (Multi-window Fault diagnosis) algorithm is proposed to improve the performance of window-based algorithms in the case that the time window is set improperly. The intrinsic shortcoming of window-based algorithms is that their accuracy depends on whether the time window size is set correctly. By considering the correlation of adjacent time windows, MFD can alleviate the

节点文献中: 

本文链接的文献网络图示:

本文的引文网络