节点文献
龙芯3B处理器上FFT算法向量化研究
Vectorization Study for FFT Algorithm Based on Godson 3B
【摘要】 龙芯3B处理器是龙芯3号多核处理器的第二款产品,主要面向高性能计算、高端嵌入式等应用领域.快速傅里叶变换(Fast Fourier Transform,FFT)作为数字信号处理、图像处理等领域的基本研究工具,其在龙芯3B处理器上的高效实现是必不可少的.然而目前的FFT算法因未能充分挖掘龙芯3B处理器的硬件特性,仍面临算法性能较低的问题.针对该问题,对FFT算法进行分析,并结合龙芯3B处理器的体系结构特征,提出基32迭代的向量化FFT算法.实验结果表明,在龙芯3B处理器上基32迭代的向量化FFT算法平均性能达到765.15M flops,是相同环境下FFTW软件包(Fast Fourier Transform in the West)性能的2.12倍,最高性能可以达到1341.12Mflops,是相同环境下FFTW软件包性能的3.51倍.
【Abstract】 Godson-3B is the second chip of the Godson-3 Chip M ulti-Processor( CM P) series,which is targetedat high-performance scientific computing and high-end embedded applications. As the basic tool for digital signal processing,image processing and other fields,it is absolutely necessary for us to optimize FFT algorithm based on Godson-3B processor. However,current FFT algorithms provide a lowperformance because they haven’t taken full advantage of the Godson-3B characteristic. In this paper,we analyzed the radix-2 FFT algorithm,and present a radix-32 vectorization FFT algorithm combined with the Godson-3B architecture. The results showthat the average computing speed of our algorithm is up to 765. 15 M flops,which is 2. 12 times as fast as the FFTW( Fast Fourier Transform in the West),the highest computing speed is1341. 12 M flops,which is 3. 51 times as fast as the FFTW.
- 【文献出处】 小型微型计算机系统 ,Journal of Chinese Computer Systems , 编辑部邮箱 ,2015年07期
- 【分类号】TP332;TP301.6
- 【被引频次】9
- 【下载频次】111