节点文献

一种新的MPI Allgather算法及其在万亿次机群系统上的实现与性能分析

Implementation and Performance Analysis of a New MPI Allgather Algorithm on Terascale Linux Clusters

  • 推荐 CAJ下载
  • PDF下载
  • 不支持迅雷等下载工具,请取消加速工具后下载。

【作者】 陈靖张云泉张林波袁伟

【Author】 CHEN Jing 1),2) ZHANG Yun-Quan 2),3) ZHANG Lin-Bo 4),5) YUAN Wei 2),3) 1)(Department of Computer Science, University of Science and Technology of China, Hefei 230026) 2)(Laboratory of Parallel Computing, Institute of Software, Chinese Academy of Sciences, Beijing 100080) 3)(State Key Laboratory of Computer Science, Chinese Academy of Sciences, Beijing 100080) 4)(Academy of Mathematics and Systems Sciences, Chinese Academy of Sciences, Beijing 100080) 5)(State Key Laboratory of Scientific and Engineering Computing, Chinese Academy of Sciences, Beijing 100080)

【机构】 中国科学技术大学计算机科学与技术系中国科学院软件研究所并行计算实验室中国科学院数学与系统科学研究院中国科学院软件研究所并行计算实验室 合肥230026中国科学院软件研究所并行计算实验室北京100080北京100080中国科学院计算机科学国家重点实验室北京100080北京100080中国科学院科学与工程计算国家重点实验室北京100080

【摘要】 给出一个新的MPIAllgather算法———邻居交换算法(neighborexchange).提出的平均逻辑通信距离的概念和计算公式,可以有效地衡量通信的局部性.通过分析,发现在4种MPIAllgather算法中,邻居交换和环算法均具有最优的通信局部性.在万亿次机群深腾6800和曙光4000A上对4个MPIAllgather算法进行的性能测试和分析结果表明,邻居交换算法的长消息通信性能最优,中长消息通信性能不稳定,短消息通信性能次于递归倍增和Bruck算法.

【Abstract】 Message Passing Interface(MPI) is one of the most important parallel programming environment. The MPI library provides point-to-point and collective communication functions, among which MPI Allgather is one of the most frequently used functions. Three kinds of algorithm are implemented for MPI Allgather in the latest versions of MPICH, i.e., the ring, the recursive doubling and the Bruck algorithms. In order to minimize the TCP traffic and congestion over Fast Ethernet, the authors propose a new MPI Allgather algorithm, namely the neighbor exchange. In the neighbor exchange algorithm, a property of pair-wise communication is incorporated and a process always exchanges data with its logical neighbor processes. A new concept, the Average Logical Communication Distance(ALCD), is proposed to measure the algorithmic communication locality. Analysis on the ALCD for the four algorithms reveals that the neighbor exchange and the ring algorithms have the best communication locality property among the four MPI Allgather algorithms. Numerical experiments on terascale Linux clusters DeepComp 6800 and DAWNING 4000A show that the neighbor exchange algorithm performs the best for long messages but is suboptimal for short and medium sized ones. For medium-size messages, the ring algorithm performs the best and for short messages, the recursive doubling algorithm performs the best.

【基金】 国家自然基金重点项目(60533020);国家自然科学基金(60303020);国家“九七三”重点基础研究发展规划项目基金(G1999032805,2005CB321702);国家“八六三”高技术研究发展计划项目“高性能计算机及其核心软件”重大专项课题“高性能计算机性能测试技术及方法研究”(2004AA104020);中国科学院软件研究所培育项目基金(CXK25628);北京邮电大学网络与交换技术国家重点实验室开放课题(200505)资助.
  • 【文献出处】 计算机学报 ,Chinese Journal of Computers , 编辑部邮箱 ,2006年05期
  • 【分类号】TP338.6
  • 【被引频次】16
  • 【下载频次】270
节点文献中: 

本文链接的文献网络图示:

本文的引文网络