节点文献
基于BOOM处理器的访存优化研究与实现
Research and Implementation of Memory Access Optimization Based on BOOM
【作者】 刘鹏;
【作者基本信息】 西安电子科技大学 , 工程硕士(专业学位), 2020, 硕士
【摘要】 现代高性能处理器设计过程中,有两大重点问题:一个是处理器核心的计算性能,一个是处理器核心与存储系统的数据交互的效率。经过几十年的计算机体系结构的发展,随着处理器架构设计能力以及半导体制造工艺水平的提升,处理器性能正随着摩尔定律的速度不断进步。在处理器核心的计算能力达到一定高度时,处理器与存储器之间的数据通信效率就对处理器性能有着关键性的影响,并成为整体系统的一个瓶颈,我们通常称之为存储墙。为了解决存储墙问题,访存优化以及缓存系统的设计成为了现代处理器设计中的一个核心研究方向。BOOM乱序超标量流水线处理器是采用最新一代RISCV开源精简指令集架构的通用处理器,其采用了较多的经典微架构的设计。本文以BOOM处理器作为实验与优化的平台。首先,深入分析了其作为超标量乱序流水线处理器的架构特点和设计思路,同时分析其访存单元,缓存系统以及访存行为的特点与优化空间。写合并是一种常用可以有效减小处理器访存代价的缓冲区设计思路。在本文中,根据写合并的思想,缓存缺失处理机制及缓存系统的访存行为在缓存缺失处理机制中加入了写合并访存优化结构。在电路设计上,本文采用了基于Scala语言的全新硬件设计语言Chisel,还使用了一系列的RISCV工具链软件进行开发。其次,在缓存系统中加入写合并优化结构后,为了验证访存性能的提升,本文使用创新性的硬件敏捷开发流程,利用了最新的Diplomacy总线参数协商自动生成的设计方法,集成了一系列IP与硬件资源,设计与实现了一套完整So C平台总线。尤其在吸收Chisel语言作为高级语言的特性后,这样的设计方式给总线设计带来了优秀的可重用性,可扩展性及灵活性。本文应用的Tile Link协议作为新型的芯片级的总线互联协议,吸收了过去总线协议的优点,拥有一些适合于复杂系统的特性,也较好地支持缓存系统与总线系统的通信。最后基于Xilinx vc709 FPGA开发板与Vivado软件完成BOOM处理器平台设计的原型验证系统,完成Boot程序的编写,以及处理器上Linux系统启动的软硬件的协同,并进行了SPEC CPU 2006处理器基准性能测试程序的测试,结果表明本文设计实现的写合并优化方案对BOOM处理器平台的性能实现了最优2.8%的性能提升。通过DC综合工具结果可知,此优化机制对缓存系统造成的额外硬件开销,功耗,以及面积都影响较小。
【Abstract】 In the design process of modern high-performance processors,there are two major issues.One is the computing performance of the processor core,and the other is the efficiency of data interaction between the processor core and the storage system.After decades of development in computer architecture,with the improvement of processor architecture design capabilities and semiconductor manufacturing processes,the performance of processors has almost continuously improved with the speed of Moore’s Law.When the computing power of the processor has been greatly improved,the data communication efficiency between the processor and the memory has a critical impact on the performance of the processor and becomes a bottleneck in the overall system.This is what the researchers call storage wall problem.In order to solve the storage wall problem,memory access optimization and cache system design have become an eye-catching and challenging research direction in modern processor design.The BOOM out-of-order superscalar pipeline processor is a general-purpose processor that uses the latest generation of RISCV open source ISA,which uses many classic microarchitecture designs.This article is based on the BOOM processor as a platform for experimentation and optimization.First,it analyzes the architecture characteristics of the superscalar out-of-order pipeline processor and its design ideas,and analyzes the characteristics and optimization space of its memory access unit,cache system,and memory access behavior.Write combining was used as a buffer design idea to reduce the cost of processor accessing memory most of the time.In this article,according to the idea of write combining,cache missing processing and cache accessing,a write combining cache accessing mechanism is added to the cache system’s miss processing mechanism.For circuit design part,this article uses a new hardware design language Chisel based on the Scala language,and also uses a series of RISCV tool chain software for development.Secondly,after adding a write combining optimization structure to the cache system,in order to verify the improvement of memory access performance,this article designs a self-contained So C platform bus with a series of IP with hardware resources integrated into it,using the latest Diplomacy bus parameter negotiation and automatic generation design method and an innovative hardware agile development process.Especially after absorbing the features of the Chisel language as a high-level language,the design method brought excellent reusability,scalability,and flexibility to the bus design.As a new chip-level bus interconnect protocol,the Tile Link protocol applied in this paper absorbs the advantages of past bus protocols,has some characteristics suitable for complex systems,and also supports the communication between the cache system and the bus system.Finally,based on the Xilinx vc709 FPGA development board and VIVADO software,the prototype verification system for the BOOM processor platform design was completed with the boot program,and the software and hardware coordination of the Linux system startup on the processor was performed.In addition,we also conducted the SPEC CPU 2006 processor benchmark performance test program.The results show that the write merge optimization scheme designed and implemented in this paper achieves an optimal 2.8% performance improvement on the performance of the BOOM processor platform.According to the results of the DC synthesis tool,the additional hardware overhead,power consumption,and circuit area caused by this optimization mechanism to the cache system are negligible.
【Key words】 BOOM; Write Combining; Cache; Diplomacy mechanism; Cache Coherence;
- 【网络出版投稿人】 西安电子科技大学 【网络出版年期】2021年 05期
- 【分类号】TP332
- 【下载频次】118