节点文献
面向RDMA和NVM的弹性分布式键值存储设计与实现
Design and Implementation of Elastic Distributed Key-Value Stores for RDMA and NVM
【作者】 朱晓光;
【导师】 王芳;
【作者基本信息】 华中科技大学 , 计算机系统结构, 2023, 硕士
【摘要】 远程直接内存访问(Remote Direct Memory Access,RDMA)技术配合非易失内存(Non-Volatile Memory,NVM)器件提供了一种数据不经由中央处理器(Central Processing Unit,CPU)转发的远程直接持久内存访问(Remote Direct Persistent Memory Access,RDPMA)数据路径,大大降低了分布式持久存储系统中的数据操作延迟。然而现有的分布式存储系统多采用服务端主导的实现,因此若要充分发挥RDPMA数据路径旁路服务端CPU所带来的性能优势,需要重新设计分布式存储系统的数据放置策略与数据访问协议。针对RDPMA数据路径要求内存区域静态注册导致分布式存储系统弹性伸缩功能难以实现的问题,提出了去中心化弹性数据放置策略。该策略首先通过引入粗粒度空间管理层将已分配空间与集群提供的空间进行解耦,从而允许系统在线上进行高效的负载均衡;其次通过使用伪随机计算完成数据放置,使系统关键元数据最小化,并采用去中心化的方式存储其他元数据,从而消去了中心式管理集群空间分配所带来的瓶颈。针对服务端软件栈上下文切换开销过高的问题,提出了客户端主导的数据访问协议。该协议通过使用RDMA单边原语绕开服务端CPU,并将并发访问控制完全移至客户端实现,使数据访问关键路径不涉及服务端软件栈,从而降低数据访问的延迟,提升系统在高并发场景下的吞吐量。基于上述设计实现了分布式键值存储系统Gestalt。Gestalt支持线上弹性伸缩,并将数据迁移时间控制在秒级。测试结果表明,在倾斜负载下Gestalt相较于现有基于RDMA与NVM的分布式存储系统的读写延迟分别降低了65%与63%,读写吞吐量分别提升了171%和122%。在多租户高并发场景下,Gestalt的写吞吐量能够随用户数量的增加而增长,在测试中达到了对比系统的10.2倍。开启主从副本冗余后,Gestalt的写性能虽然出现下降,但其吞吐量仍然超过对比系统45%。
【Abstract】 The Remote Direct Memory Access(RDMA)technology coupled with non-volatile memory(NVM)devices offers a Remote Direct Persistent Memory Access(RDPMA)data path that bypasses the remote central processing unit(CPU).Such a data path significantly reduces data access latencies in distributed storage systems.However,existing distributed storage systems mostly employ a server-centric implementation.Therefore,to thoroughly leverage the performance benefit of the server-bypassed RDPMA data path,the data placement scheme and data access protocol of distributed storage systems ought to be redesigned.The RDPMA data path requires memory regions to be statically registered,which obstructs the implementation of the elasticity feature of distributed storage systems.To address this issue,a decentralized elastic data placement scheme is introduced.A coarsegrained space management layer is added to decouple allocated space and cluster storage space,so the storage system can perform efficient online load-rebalancing.Moreover,to eliminate bottlenecks caused by centralized storage space allocation management,pseudorandom calculation is adopted to minimize pivotal system metadata,while other metadata is stored in a decentralized fashion.On the other hand,context switching on the server-side software stack introduces unnecessary overhead.To address this issue,a client-centric data access protocol is designed.By adopting one-sided RDMA verbs to bypass remote server CPU and moving concurrency control implementation entirely to the client-side,data access no longer incurs server-side software.Such a design reduces latency and increases throughput under high contention circumstances.Based on the designs addressed above,a distributed key-value storage system,Gestalt,is proposed.Gestalt supports online elastic expansion and shrinking,while subsequent data migration can complete within seconds.Experiment results show that under skewed workload,compared with existing distributed stores designed for RDMA and NVM,Gestalt reduces I/O latency by 65% and 63% and increases I/O throughput by 171% and 122%respectively.Under multi-tenant use cases,the write throughput of Gestalt is able to grow as the number of users increases,reaching 10.2x of compared systems in benchmarks.After enabling primary-backup redundancy,while Gestalt’s write throughput is negatively affected,it is still 45% faster than compared systems.
【Key words】 Distributed Key-Value Store; Elasticity; Remote Direct Memory Access; Non-Volatile Memory;
- 【网络出版投稿人】 华中科技大学 【网络出版年期】2024年 12期
- 【分类号】TP333