节点文献

Internet级联故障分析与检测控制技术研究

Cascading Failure Analysis and Research on Detecting and Controlling Technologies in Internet

【作者】 王健

【导师】 刘衍珩;

【作者基本信息】 吉林大学 , 计算机系统结构, 2007, 硕士

【摘要】 随着Internet的飞速发展,尤其是近几年用户数量和应用种类的急剧增加,使得Internet中的某些节点处在高负荷工作状态。若对这些节点进行攻击,比如病毒攻击,会导致节点过载甚至无法正常工作,迫使数据包重新路由,从而引起其他路由器接连过载,产生级联故障。大规模的级联故障一旦发生,往往具有极强的破坏力和影响力。因此,有必要对Internet级联故障的发生机理和控制技术做深入研究。本文在充分吸收国内外复杂网络关于级联故障领域研究成果的基础上,分析Internet网络拓扑和数据通信的特征,对影响Internet级联故障发生、发展、预防和控制的几个因素进行分析研究,重点对级联故障的动态模型进行了详细阐述和分析。针对流量分布极不均匀的特点,创新性地提出了一种适合Internet环境的级联故障模型,负载的动态变化、过载函数和延迟时间的引入,均使得该模型比较符合真实Internet的行为特征;借鉴蠕虫早期检测和森林防火控制的技术,提出了一种基于“检测趋势,避免阈值”和“点火自救”为主要思想的Internet级联故障两阶段方法。通过有效的算法,在级联故障发生的慢启动阶段完成对其的早期检测,进而实施快速的响应机制,及时采取措施,以避免级联故障进入快速传播阶段,防止造成Internet故障的大规模爆发,从而导致整个网络的瘫痪。实验表明,本文提出的两阶段方法不仅能够及时有效地抑制级联故障的大规模传播,而且不会大幅度降低整个网络的效率,保证大部分用户可以正常地工作。

【Abstract】 With the rapid development of Internet, especially the amount of users and applications growing at a tremendous rate in recent years, network topology is becoming larger and more complicated and traffic is growing exponentially. Although the throughput of routing equipment is constantly increasing, it lags behind the increasing rate of user’s bandwidth demand. Then this makes certain nodes work in a state of high load. If such nodes are attacked, such as virus, it will lead to be overloadable and incapable of working, forcing data packets to reroute, thereby causing other routers to overload successively, and resulting in the cascading failures. Once the large-scale cascading failures take place, they often engender the destructivity and influence powerfully. Therefore, it is necessary to systematically research on the occurred mechanisms and control technologies of cascading failures of Internet.Currently, the varieties of scientific knowledge and research about various quantitative and qualitative features of the complex network have become an extremely important challenging topic in networkable era for scientific research. Recent studies have revealed that the failure of one or a few nodes or edges (this may be a random failure, also the result of deliberate attacks) can cause other nodes to fail through the coupled relations between nodes, then it would be a domino effect and lead to the collapse of a considerable number of nodes or even the entire network. Internet is the largest artificial complex network and traditionally it’s mainly focusing most attention on preventing cascading failures, such as the capacity of the routers located in the trunk of Internet often being larger than normal, usually there being redundant links between core nodes, etc. However, the large-scale cascading failures still occur at times and people lack effective response mechanism to the cascading failures, it is mainly owing to the lack of the research on control technology and the understanding of cascading failures of Internet.Based on summarizing series of important research works of the complex network’s cascading failures, analyzing the characteristics of Internet topology and traffic, in view of the characteristics of heterogeneous distribution of loads and learning from the worm early detection as well as the control technology of forest fire, this paper presents a“two-stage approach”control method for restraining the cascading failures of Internet by closing some nodes or links initiatively. Through the analysis of the propagational characteristics of the cascading failures, this method divides it into two phases: the slow start and the rapid propagation. Through effective control technologies we can detect the cascading failures at the slow-start phase, and then perform the fast response mechanism and timely take measures to prevent cascading failures into rapid-propagation phase so as to cause massive outbreak of Internet failures and then result in collapse of the entire network. Therefore, there are two key points: the early detection of cascading failures together with the effective and fast response mechanism for this approach.The goal of the early detection phase is as early as possible to detect cascading failures occurred. A threshold of "target" is generally defined in traditional methods, such as the number of computers infected by worm virus or nodes and links of their load exceeding the capacity, and then the shortcoming of the methodology is high false alarm rate. Combining the ideas of rate detection in the worm early detection area, this paper proposes the detection thinking of "detecting the trend not the threshold" and filters the number of "target" in unit time with some rules. One of characteristics of Internet communication is traffic burst so as to make the loads of some nodes occasionally exceed their capacity, however, that does not necessarily lead to the occurrence of cascading failures. Generally speaking, a communication protocol with the function of flow regulation would make the nodes carrying minor overload restore to the baseline without resulting in massive nodes overloaded, thus triggering cascading failures. It was thought that the phenomenon is noise and cascading failures occur only when the number of overloadable nodes and links exists a stable plus trend.The aim of effective and fast response mechanism is to slow and prevent cascading failures from the large-scale propagation. Traditional methods only just tolerate a small degree of traffic burst not completely avoid cascading failures, furthermore needing additional investment result in some waste of resources. The ignition-help method which is widely used to control the forest fire was absorbed in this paper. According to the loads of the various nodes and links in Internet when detecting the occurrence of cascading failures at the slow-start phase, premising not significantly reducing the efficiency of network, the method initiatively closes some of certain "flow generator" nodes or links with high load, mainly because from the perspective of traffic engineering, each node in Internet has two functions. They act as transmitters, which distribute load, and generators, which create load. The former than the latter is called "flow generator" node, which is the main traffic producer and has little impact on the connectivity of entire network; otherwise known as "flow transmitter" node which is the main flow distributor and constitutes the core of entire network. The method can reduce the load of the entire network by closing some of certain "flow generator" nodes, which can reach a balanceable distribution of load. Because the links with large load often connect with the "flow generator" nodes and "flow transmitter" nodes, the method closes such links that can effectively reduce the load of "flow transmitter" nodes and avoid their failures as a result of overload and leading to the collapse of the entire network for broken.The experiments show that the two-stage approach proposed in this paper is capable of not only effectively and timely restraining the large-scale propagation of cascading failures, but not significantly reducing the efficiency of the entire network, ensuring the majority of users can work normally.

  • 【网络出版投稿人】 吉林大学
  • 【网络出版年期】2007年 03期
  • 【分类号】TP393.06
  • 【被引频次】3
  • 【下载频次】213
节点文献中: 

本文链接的文献网络图示:

本文的引文网络