基于DPDK的集群内并发数据流传输机制研究
Research on Concurrent Data Stream Transmission Mechanism in Clusters Based on DPDK
摘要: 针对人工智能和物联网技术发展带来的大规模AI数据流传输需求,文章基于DPDK框架,设计了一种高性能的并发数据流传输机制,以优化集群内节点间的数据通信效率。并提出了高性能传输策略,结合端口分配策略以及双链路聚合调度策略,实现了大数据流的多端口并行传输。通过多网卡绑定、多核分配技术,以及反向流量控制机制,系统能够动态调整传输路径,优化带宽利用率,显著降低传输延迟和丢包率。实验表明,该方案在吞吐量、延迟和丢包率方面相较传统方法具有显著优势,在高并发场景下展现出优异的性能扩展性。本研究为大规模AI数据流的高效传输提供了一种低成本、高效率的解决方案,可为未来大数据和人工智能领域的分布式计算系统设计提供重要参考。
Abstract: To address the demand for large-scale AI data stream transmission driven by the development of artificial intelligence and the Internet of Things, this study designs a high-performance concurrent data streaming mechanism based on the DPDK framework to optimize the data communication efficiency between nodes in the cluster. In this study, a high-performance transmission strategy is proposed, which combines the port allocation strategy and the dual-link aggregation scheduling strategy to realize the multi-port parallel transmission of large data streams. Through multi-NIC binding, multi-core distribution technology, and reverse flow control mechanism, the system can dynamically adjust the transmission path, optimize bandwidth utilization, and significantly reduce transmission delay and packet loss rate. Experiments show that the proposed scheme has significant advantages over traditional methods in terms of throughput, latency, and packet loss rate, and shows excellent performance scalability in high-concurrency scenarios. This paper provides a low-cost and high-efficiency solution for the efficient transmission of large-scale AI data streams, which can provide an important reference for the design of distributed computing systems in the field of big data and artificial intelligence in the future.
参考文献
|
[1]
|
绯樱. 2022中国IDC数据中心TOP30 [J]. 互联网周刊, 2023(16): 11.
|
|
[2]
|
宫学庆, 金澈清, 王晓玲, 等. 数据密集型科学与工程: 需求和挑战[J]. 计算机学报, 2012, 35(8): 1563-1578.
|
|
[3]
|
黄訸, 易晓东, 李姗姗, 等. 面向高性能计算机的海量数据处理平台实现与评测[J]. 计算机研究与发展, 2012, 49(S1): 357-361.
|
|
[4]
|
张郁. 基于DPDK实现企业网络性能优化的研究与设计[D]: [硕士学位论文]. 郑州: 郑州大学, 2018.
|
|
[5]
|
朱河清, 梁存铭, 胡雪焜. 深入浅出DPDK [M]. 北京: 机械工业出版社, 2016: 10-34.
|
|
[6]
|
Intel (2023) Programmer’s Guide. https://doc.dpdk.org/guides/prog_guide/index.html
|
|
[7]
|
袁旭初, 付国, 毕继泽, 等. 分布式数据流计算系统的数据缓存技术综述[J]. 大数据, 2020, 6(3): 101-116.
|
|
[8]
|
da Silva Veith, A., Dias de Assunção, M. and Lefèvre, L. (2023) Latency-Aware Strategies for Deploying Data Stream Processing Applications on Large Cloud-Edge Infrastructure. IEEE Transactions on Cloud Computing, 11, 445-456. [Google Scholar] [CrossRef]
|
|
[9]
|
Muresano, R., Meyer, H., Rexachs, D. and Luque, E. (2017) An Approach for an Efficient Execution of SPMD Applications on Multi-Core Environments. Future Generation Computer Systems, 66, 11-26. [Google Scholar] [CrossRef]
|
|
[10]
|
周勇, 王皓, 程春田, 等. 基于GPU的多数据流相关系数并行计算方法研究[J]. 计算机应用研究, 2010, 27(4): 1232-1235.
|