节点文献
广域网环境下大规模分布式系统的局部回卷恢复
Bound Rollback Recovery of Large Distributed System in WAN Environment
【Author】 Jin-Min Yang Da-Fang Zhang Xue-Dong Yang Collage of Computer & Communication, Hunan University, ChangSha 410082 China Department of Computer Science, University of Regina, Regina, SK S4S OA2 Canada
【机构】 湖南大学计算机与通信学院; 加拿大Regina大学计算机科学系;
【摘要】 广域网环境下的大规模分布式系统与集簇式分布式系统相比,具有不同特性。现有乐观消息日志方法不能满足广域网环境下大规模分布式系统的容错需要,表现为开销大,效率低。本文给出了广域网环境下大规模分布式系统的三层模型,并在此模型下给出了基于代理的消息依赖关系跟踪方法。基于代理的消息依赖关系跟踪方法不仅无故障时开销低,而且实现了快速输出提交。故障恢复时,故障通知和回卷范围限定在一个块内,从而提高了恢复效率,降低了恢复开销。
【Abstract】 Large distributed system in WAN environment holds some distinctions from conventional distributed system, bringing some new issues of overhead and recovery efficiency to the existing optimistic message logging protocols. The paper introduces a three-layer model for large distributed system in WAN environment, and presents a protocol of message dependency tracking on basis of proxy. Message dependency tracking on basis of proxy achieves low fault-free overhead as well as fast output commit. In our protocol, failure notification and rollback are limited in a scope called block rather than whole system, so recovery efficiency is improved and recovery overhead is decreased.
【Key words】 Checkpoint; Rollback recovery; Message logging; Large distributed system; Three-layer model; Proxy; WAN; Grid;
- 【会议录名称】 第十届全国容错计算学术会议论文集
- 【会议名称】第十届全国容错计算学术会议
- 【会议时间】2003-09
- 【会议地点】中国北京
- 【分类号】TP393.2
- 【主办单位】中国计算机学会容错计算专业委员会