节点文献
流式处理系统的动态数据分配技术
Dynamic data distribution for stream processing system
【摘要】 流式数据处理中,数据倾斜等原因易导致计算节点的负载不均衡,降低系统处理能力。传统的负载均衡方法,比如算子分配、算子迁移和负载脱落等技术因为相对较高的性能代价,在流式处理系统中没有得到广泛的应用。针对流式处理系统的特点,提出一种新的负载均衡方法。在该方法中,计算单元的数据被划分为若干分区,并且数据分区可以在计算单元中动态分配和迁移,在较少干扰系统运行的情况下,通过动态调整各计算单元的分区,平衡各个计算单元的输入流和利用率,以此达到负载平衡的目的。在此基础上,设计并实现了流式处理系统的负载均衡算法和数据在线迁移技术。实验结果表明,该方法能够显著减少数据处理的平均延迟,提高系统吞吐量。
【Abstract】 In stream processing systems,data skew often leads to load imbalance among computing nodes,thereby increases the response time of data process.Traditional load balancing methods such as operator distribution,operator migration and load shedding have never been widely applied in stream processing systems because of a relatively high performance penalty.Considering the characteristics of stream processing systems,a new load balancing mechanism is proposed.In this mechanism,the data on computing units are split into some sections,and each section can be allocated and migrated dynamically among computing units.Then,for the purpose of load balancing,the input streams and utilizations are balanced among computing units by adjusting sections with few disturbances on steam processing systems.Based on this,we design and implement a load balancing algorithm as well as an online data migration method.The experimental results show that our mechanism can reduce the average latency of data processing and improve the system throughput significantly.
【Key words】 data stream; stream processing; load balancing; data distribution; data migration;
- 【文献出处】 计算机工程与科学 ,Computer Engineering & Science , 编辑部邮箱 ,2014年10期
- 【分类号】TP338.8
- 【被引频次】6
- 【下载频次】214