节点文献

利用程序分析和优化提高Cache性能

Improving Program Cache Performance by Program Analysis and Optimization

【作者】 付雄

【导师】 陈意云;

【作者基本信息】 中国科学技术大学 , 计算机软件与理论, 2007, 博士

【摘要】 在近40年来,处理器运行速度的增长和存储器访问速度的增长之间存在着巨大的差距,这使得两者之间的速度差距越来越大,导致的“Memory Wall”问题变得越来越严重,已经成为整个系统性能最主要的瓶颈之一。现代计算机体系结构中广泛采用Cache来缓解这两者之间的速度差距,使得Cache已经成为影响处理器性能、能耗、价值的重要因素之一。Cache的充分利用很大程度上取决于程序自身的局部性,特别是访问数据的局部性,包括时间局部性和空间局部性。优化程序的数据局部性成为提高Cache性能的重要方法之一。由于无需硬件的支持,具有良好的跨平台性.利用程序分析和优化来改善程序的局部性,提高Cache性能成为当前研究的热点。过去在这方面的工作有两个重要的思路:一是针对程序运行时的访问数据,利用相关的Cache行为模型,建立一些程序分析工具,从源代码级给出程序Cache性能的瓶颈,指导程序员通过程序变换来优化程序的局部性,从而提高Cache性能;二是在编译器上开发编译优化过程,或者开发专门的程序优化工具,通过对程序进行分析,在此基础上自动进行程序变换,包括代码变换和数据变换两种,优化程序的局部性,从而提高Cache性能。本文根据上面的思路进行了下面的一些工作:(1)程序Cache行为模型的研究虽然复用距离已经成为程序Cache行为的一种重要度量标准,但是高复杂度和可能存在的内存溢出问题使得其难以被应用。在通过引入最大Cache大小,限制复用距离分析范围的基础上,本文提出了一种受限的复用距离模型。受限的复用距离舍弃少量访问的复用距离分析精度,有效地避免了进行复用距离分析时可能导致的内存溢出问题,同时使得复用距离的分析达到访问数据次数的线性时间复杂度。文章还通过实验说明了基于受限的复用距离进行Cache失效率分析的可行性和正确性。(2)程序Cache行为分析工具的设计与实现程序Cache行为的分析和理解能够帮助程序员寻找程序中Cache性能的瓶颈,指导程序员通过程序变换改善程序局部性,提高程序Cache性能。本文介绍一种基于复用距离的程序Cache行为分析工具的设计与实现,该工具利用一种基于中间信息栈的源代码级插桩收集程序的访存信息,然后基于受限的复用距离模型分析程序的Cache行为。同时还通过示例表明该分析工具不但能在源代码中给出Cache性能瓶颈的位置,指导进行代码变换;而且还能分析变量的Cache行为关系,指导进行数据变换。(3)基于复用距离分布的数据布局优化在利用程序Cache行为分析工具进行分析时,总结出程序中经常在一起访问的变量的复用距离在分布上具有一定的相似性。如果将这些变量进行重组,改变其布局,则能提高Cache性能。根据这个原理,本文设计并实现了一个数据布局优化框架,在框架中采用了一种基于复用距离分布的变量关系模型,用于寻找程序中经常在一起访问的变量。目前在框架中实现了一种动态数组的重组方法,用来重组利用变量关系模型发现的动态数组,以此来提高程序的数据局部性。针对SPEC CPU2000中部分测试程序的实验表明了该优化框架在优化数据局部性和提高数据Cache性能上具有一定的可行性和有效性。(4)利用结构拆分提高Cache性能结构拆分作为一种常用的通过数据变换提高Cache性能的方法,由于只需要分析结构属性域之间的相对关系,具有一定的特殊性。为了克服复用距离分析的复杂性,本文在引入一种较复用距离更为简单的距离的基础上,提出了一种属性域关系模型来度量结构中属性域之间的关系,然后寻找程序中的结构的优化布局,通过结构拆分来优化数据布局,从而提高数据Cache性能。相关的实验表明了这种基于属性域关系模型的结构拆分在提高Cache性能上有一定的有效性。

【Abstract】 In the past 40 years, the ever-increasing speed gap between processor and memory forms the "Memory Wall" and has resulted in one of the primary bottlenecks in computer system. Now modern computer architecture widely employs cache to decrease the impact of the gap and cache performance has an increasing influence on system speed, cost and energy usage. The utility of cache mostly depends on program locality, especially program data locality, including the temporal locality and the spatial locality. Thus, optimizing program data locality has become an important method to improve cache performance.Due to its independence of hardware and platform support, optimizing program locality by program analysis and optimization has become an active area to improve cache performance. There used to be two primary methods to optimize program data locality: the first is to analyze cache behavior of program data, then basing on cache behavior model, build a program analysis tool which shows the bottlenecks of data cache performance and hence directs programmer to tune performance of data cache; the second is program transformation by an optimizing compiler or an optimizing tool, and data cache performance is optimized in the transformation.This thesis introduces the following works basing on the above two methods:(1) Research on Program Cache Behavior ModelThough reuse distance has become an important metric of program cache behavior, problems such as high complexity and possible memory overflow have largely restricted its application. By limiting the max cache size and the area of reuse distance analysis, this paper introduces a limited reuse distance analysis method. This method efficiently avoids possible memory overflow problem in normal reuse distance analysis, and at the same time reduces complexity of reuse distance analysis to linear. Experiments on some integer and floating-point programs have demonstrated the feasibility and correctness of cache miss rate analyzed based on this reuse distance analysis.(2) Design and Implementation of Cache Behavior Analysis ToolProgram cache behavior analysis and comprehension help programmer to find the bottlenecks of program cache performance, optimize locality by program trans- formation and improve program cache performance. This thesis introduces a reuse distance based program cache behavior analysis tool, employing a source level instrumentation method based on intermediate information stack to collect program data access information, and using a limited reuse distance model to analyze program cache behavior. An example shows that this cache behavior analysis tool is capable of not only locating the cache performance bottlenecks in source codes to guide the code transformation, but also analyzing the cache behavior relationship among variables, thus to direct the data reorganization.(3) Data Layout Optimization Using Reuse Distance DistributionDuring cache behavior analysis, much similarity among reuse distance signatures of variables which are often accessed together can be easily observed. If layout of this kind of variables is optimized by reorganization, the cache performance can be improved. Motivated by this consideration, this thesis outlines a data layout optimization framework. A variable relation model based on reuse distance signature is used to find variables that are often accessed together in the framework. The framework implements a dynamic array regrouping method to reorganize the dynamic arrays. Experiments on some benchmarks from SPEC CPU2000 show this framework is feasible and effective to optimize data locality and improve cache performance.(4) Improving Cache Performance by Structure SplittingStructure splitting is a common method to improve cache performance by data reorganization, and it is of certain specialities since structure splitting only needs to analyze the relative relation of fields. To overcome the complexity of reuse distance analysis, this thesis proposes a field relation model based on a simple distance to quantitate the relation of structure fields, then searches an optimized layout for structure, optimizes program data layout and improves program data cache performance by structure splitting. Experiments show that this method is effective in optimizing program data layout, improving program data cache performance and thus improving the performance of the whole program.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络