节点文献

大规模多规格图片缓存系统优化研究

Research on Optimization Methods for Large-scale Multi-resolution Photo Caching System

【作者】 孙思

【导师】 周可;

【作者基本信息】 华中科技大学 , 计算机科学与技术, 2022, 博士

【摘要】 社交网络服务商在提供高效图片服务时面临着严峻的挑战,既要处理海量图片的存储需求,又要保证全网范围内的用户体验。分布式图片缓存架构被广泛部署以满足高性能的要求,而高效缓存策略在提高图片服务质量和降低服务成本上发挥着十分重要的作用。为了满足不同用户终端对相同图片的不同规格的需求,图片服务将用户上传的图片转码为多种不同规格,称为多规格图片。多规格图片使缓存优化变得更为复杂,本研究对实际商用中的大规模多规格图片工作负载进行详细分析,总结多规格图片访问规律和特征,分别从缓存替换、缓存预取和缓存图片规格选取等方面来提高缓存性能。针对社交网络中图片访问的“短时性”特征,提出了基于访问频率衰减的缓存替换策略RPFD(Replacement Policy based on Frequency Decay)。图片负载分析发现,图片的访问呈现“短时性”特征,即图片访问大部分都集中在上传后的短时间内,在图片热度达到高峰后,访问会迅速转冷且后续几乎不再有访问。针对这一特性,RPFD对缓存队列中图片的访问频率进行追踪,当图片被逐出缓存队列时,基于图片频率判断图片热度是否完全冷却,根据结果选择将图片换出或是重新插回缓存队列,若选择重新插回则对图片频率进行衰减。RPFD对图片的换出过程符合访问频率随时间流逝而降低的“短时性”特征,相比传统替换策略能够更精准地判断图片换出缓存的时机,从而能达到更高的缓存性能。实验测试表明,RPFD相比实际工程系统采用的方法能够降低3.31%的访问延时和3.76%的回源流量,与当前最佳替换策略相当,但具有远低于后者的算法开销。针对社交网络中图片访问的“即时性”特征,提出了基于规格优先级的图片预取策略PPRP(Photo Prefetching Policy based on Resolution Popularity)。图片访问的“即时性”特征是指用户上传的图片有极高的概率在短时间内被其他用户所访问。针对这一特性,PPRP主动将用户上传的图片提前预取进入缓存,消除图片访问的强制缺失,提高缓存命中率。考虑到不同规格的图片具有不同的访问热度,PPRP设计了基于规格热度优先预取高热度规格的预取策略。为了缓和无效规格预取可能造成的缓存污染问题,PPRP进一步设计了智能换出方法将具有低访问频率的预取图片优先换出。实验测试表明,相比无预取缓存策略,PPRP最大能提高7.4%的缓存命中率,降低6.9%的图片访问延时。针对多规格图片缓存中的规格选取问题,提出了基于全局规格优先级的图片规格选取策略RSRP(Resolution Selection Policy based on Global Resolution Priority)。多规格图片服务中,高规格图片能够转码生成低规格图片,反之则不行。当缓存中存在较高规格的图片时,对低规格图片通过实时转码而非缓存存储的方式来获得,能达到节省缓存空间的效果,从而提高缓存空间利用效率和缓存性能。RSRP利用不同时段不同规格图片访问分布稳定的特征,根据全局规格缓存效率建立缓存规格优先级,选择优先级高的规格进入缓存。实验表明,相比最大需求规格选取策略,RSRP能够降低13.7%的访问延时和48.5%的转码开销。

【Abstract】 Social network providers are facing critical challenges of dealing with the huge amount of photo storage,typically in a magnitude of billions of photos,while ensuring nationalwide or world-wide satisfactory user experiences.Distributed photo caching architectures are widely deployed to meet high performance expectations,where efficient caching strategies still play essential roles in improving the quality of service while reducing services costs.To meet the requirements of different terminals for various photo resolutions,photo providers adopt a resizing mechanism transcoding the photos uploaded by users into a variety of resolutions,called multi-resolution.The multi-resolution characteristics complicates caching optimization.This thesis provides an exhaustive analysis of large-scale photo workloads in real commercial service,summarizing the access patterns and intrinsic characteristics of multi-resolution,and designing appropriate photo caching stragegies to improve performance from cache replacement,cache prefetching and multi-resolution selection respectively.By leveraging the short-lived feature of photo access in social networks,this thesis proposes Photo Caching Replacement Policy based on Frequency Decay(RPFD).Photo workload analysis shows the access pattern exhibits a short-lived feature,that is most requests of a photo are concentrated in a short period of time after uploaded.The popularity of a photo reaches its peak within a short time and then fades away quickly.RPFD tracks the frequency of each photo in the cache queue.When a photo is evicted from the cache queue,RPFD further determins whether its popularity has thoroughly turned cold based on its frequency,and swap it out from cache store or reinsert it back into cache queue according to the result.The frequency will be decreased if choosing reinserting back.The eviction process by RPFD is in line with the decay of the popularity of photos over time,thus determining the timing of eviction more accurately than traditional replacement algorithms and achieving higher caching performance.Experiments show that RPFD can reduce access latency and backend traffic by a maximum of 3.31% and 3.76% respectively compared to the policy used in production,which is comparable to the state-of-the-art cache policy,whereas with a much lower algorithmic overhead.By leveraging the immediacy feature of photo access in social networks,this thesis proposed Photo Prefetching Method based on Resolution Popularity(PPRP).The immediacy indicates photos shared by users will be requested by other users within a short period of time after uploaded.Base on this feature,PPRP proactively prefetches photos into the cache before being requested by other users,thereby eliminating the compulsive miss and improving cache hit ratio.Considering that different resolutions exhibit different popularities,PPRP prioritizes prefetching resolution with higher polularity.In order to alleviate the cache pollution caused by invalid prefetching,PPRP designs smart evict optimization methods that proactively evict prefetched photos with low access frequency.Experiments show that PPRP can improve cache hit rate by up to 7.4% and reduce access latency by 6.9% compared to the original policy without prefetching.To solve the resolution selection problem in multi-resoluiton photo cache,this thesis proposes Resolution Selection Policy based on Global Resolution Priority(RSRP).In the multi-resolution photo service,high resolution photos can be transcoded to the low by the resizing mechanism,but not vice versa.When higher resolution photos are cached,the low resolution can be obtained through real-time transcoding instead of caching store,which can save cache space and improve cache utilization efficiency compared to cache all the resolutions.By leveraging the stable distribution of requesting to various resolution photos at different times,RSRP builds caching priorities of resolutions based on global caching efficiency,and always selects the resolution with the highest priority for caching.Experiments show that RSRP can reduce access latency by 13.7% and reduce algorithmic overhead of resizing by 48.5% compared to the basic strategy that always selects the largest required resolutions.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络