一种自适应系统规模的联邦深度学习方法
A Federated Deep Learning Method for Adaptive System Scale
DOI: 10.12677/mos.2024.134407, PDF,    国家自然科学基金支持
作者: 吴宾宾, 杨桂松*:上海理工大学光电信息与计算机工程学院,上海
关键词: 联邦学习(FL)系统自适应原始–对偶优化FDLADMMFederated Learning System Adaptive Primal-Dual Optimization FDLADMM
摘要: 在面向复杂任务协作的环境中,通信带宽和计算资源的限制以及对隐私保护的需求共同构成了该研究领域的主要挑战。为了解决这些问题,研究者提出了联邦学习(Federated Learning, FL)框架作为一种解决方案。FL允许多个设备在不直接交换原始数据的情况下进行协同模型训练,从而降低了通信需求并保护了数据隐私。然而,一些FL方法采用了全客户端参与的策略,即所有客户端在每一轮中更新其本地模型。这种方法不仅增加了通信次数,而且随着客户端规模的增大,也会导致系统性能下降、响应延迟等问题。因此,本文介绍了一种基于原始–对偶优化的新FL协议(Federated Deep Learning Alternating Direction Method of Multipliers, FDLADMM)。FDLADMM算法利用双变量来引导客户端进行本地训练,减少了设备间的通信次数,优化了模型训练速度,并且随着系统规模增大,无需进行超参数调整即可有效适应。通过实验,本文展示了所提出的方法在通信效率和训练速度方面的优势,并且当系统规模不断调整时,无需进行超参数调整即可有效适应。这一创新的方法为应对复杂任务协作中的挑战提供了一种可行且高效的解决方案,并有望在未来的研究和实践中得到广泛应用。
Abstract: In the context of complex task-oriented collaboration, the limitations of communication bandwidth and computing resources, as well as the need for privacy protection, together constitute major challenges in this research area. To solve these problems, researchers propose a framework of Federated Learning (FL) as a solution. FL allows multiple devices to perform collaborative model training without directly exchanging raw data, thereby reducing communication requirements and protecting data privacy. However, some FL methods adopt a full-client-participation strategy, where all clients update their local model in each round. This method not only increases the number of communication, but also leads to performance degradation and response delay with the increase of client size. Therefore, this paper introduces a new FL protocol (Federated Deep Learning Alternating Direction Method of Multipliers, FDLADMM) based on primitive-dual optimization. FDLADMM algorithm uses bivariate to guide the client to carry out local training, reduces the communication times between devices, optimizes the training speed of the model, and with the increase of the system size, it can adapt effectively without super parameter adjustment. Through experiments, the advantages of the proposed method in communication efficiency and training speed are demonstrated, and it can be adapted effectively without hyperparameter adjustment when the system size is adjusted continuously. This innovative approach provides a viable and efficient solution to the challenges of collaboration in complex missions and is expected to be widely applied in future research and practice.
文章引用:吴宾宾, 杨桂松. 一种自适应系统规模的联邦深度学习方法[J]. 建模与仿真, 2024, 13(4): 4507-4514. https://doi.org/10.12677/mos.2024.134407

参考文献

[1] Bottou, L. (2010) Large-Scale Machine Learning with Stochastic Gradient Descent. In Proceedings of COMPSTAT' 2010, Physica-Verlag HD, 177-186. [Google Scholar] [CrossRef
[2] Dean, J., Corrado, G., Monga, R., et al. (2012) Large Scale Distributed Deep Networks. Advances in Neural Information Processing Systems, 25.
https://www.researchgate.net/publication/266225209
[3] McMahan, B., Moore, E., Ramage, D., Hampson, S. and Arcas, B.A. (2017) Communication-Efficient Learning of Deep Networks from Decentralized Data. Artificial Intelligence and Statistics, PMLR, 1273-1282. [Google Scholar] [CrossRef
[4] Li, T., Sahu, A.K., Zaheer, M., Sanjabi, M., Talwalkar, A. and Smith, V. (2020) Federated Optimization in Heterogeneous Networks. Machine Learning and Systems, 429-450. arXiv: 1812.06127.
[5] Karimireddy, S.P., Kale, S., Mohri, M., Reddi, S., Stich, S. and Suresh, A.T. (2020) SCAFFOLD: Stochastic Controlled Averaging for Federated Learning. International Conference on Machine Learning, PMLR, 5132-5143. [Google Scholar] [CrossRef
[6] Niu, X. and Wei, E. (2021) FedHybrid: A Hybrid Primal-Dual Algorithm Framework for Federated Optimization. arXiv: 2106.01279.
[7] Zhang, X., Hong, M., Dhople, S., Yin, W. and Liu, Y. (2021) FedPD: A Federated Learning Framework with Adaptivity to Non-IID Data. IEEE Transactions on Signal Processing, 69, 6055-6070. [Google Scholar] [CrossRef
[8] Zhang, R. and Kwok, J. (2014) Asynchronous Distributed ADMM for CONSENSUS Optimization. ICML, 1701-1709.
[9] Chang, T., Hong, M., Liao, W. and Wang, X. (2016) Asynchronous Distributed ADMM for Large-Scale Optimization—Part I: Algorithm and Convergence Analysis. IEEE Transactions on Signal Processing, 64, 3118-3130. [Google Scholar] [CrossRef
[10] Zheng, Y., Song, Y., Hill, D.J. and Zhang, Y. (2018) Multiagent System Based Microgrid Energy Management via Asynchronous Consensus ADMM. IEEE Transactions on Energy Conversion, 33, 886-888. [Google Scholar] [CrossRef
[11] Xiao, H., Rasul, K. and Vollgraf, R. (2017) Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv: 1708.07747.
[12] Krizhevsky, A. (2009) Learning Multiple Layers of Features from Tiny Images. University of Toronto.
https://learning2hash.github.io/publications/cifar2009learning/
[13] Lecun, Y., Bottou, L., Bengio, Y. and Haffner, P. (1998) Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE, 86, 2278-2324. [Google Scholar] [CrossRef