关于深度强化学习方法在航空装备维修保障中应用的可行性分析
Feasibility Analysis on the Application of Deep Reinforcement Learning Method in Aviation Equipment Maintenance
摘要: 当前,航空装备维修保障策略主要依靠于人工决策,对于航空装备寿命梯次编排、组训任务规划、机务保障工作规划等航空装备维修保障工作决策存在一定的局限性和短视性。本文通过对典型深度强化学习方法进行分析,围绕某飞行训练部队航空装备维修保障策略智能化开展可行性分析,建立基于多智能体的航空装备维修保障深度强化学习模型,为后续完成航空装备维修保障环境建模、多智能体强化学习训练等工作提供理论支持,对航空兵建立科学、高效的航空装备维修保障策略,实现智能化、科学化的维修保障方式,具有十分重要的现实意义。
Abstract: Currently, aviation equipment maintenance strategies primarily rely on manual decision-making, which exhibits certain limitations and short-sightedness in tasks such as lifecycle scheduling, training mission planning, and maintenance support planning. This paper analyzes typical deep reinforcement learning methods, conducts a feasibility analysis on the intelligent development of aviation equipment maintenance and support strategies for a flight training unit, and establishes a multi-agent-based deep reinforcement learning model for aviation equipment maintenance and support. It provides theoretical support for subsequent tasks such as environmental modeling for aviation equipment maintenance and multi-agent reinforcement learning training. This study holds significant practical importance for establishing scientific and efficient maintenance and support strategies for aviation forces, enabling intelligent and scientifically optimized maintenance approaches.
文章引用:孟庆骁, 王远锁, 葛明磊. 关于深度强化学习方法在航空装备维修保障中应用的可行性分析[J]. 国际航空航天科学, 2026, 14(1): 21-29. https://doi.org/10.12677/jast.2026.141004

参考文献

[1] 李茹杨, 彭慧民, 李仁刚, 赵坤. 强化学习算法与应用综述[J]. 计算机系统应用. 2020, 29(12): 13-25.
[2] 孙彧, 曹雷, 陈希亮, 徐志雄, 等. 多智能体深度强化学习研究综述[J]. 计算机工程与应用. 2020, 56(5): 13-24.
[3] Bellman, R. (1956) Dynamic Programming and Lagrange Multipliers. Proceedings of the National Academy of Sciences, 42, 767-769. [Google Scholar] [CrossRef] [PubMed]
[4] Sutton, R.S. (1988) Learning to Predict by the Methods of Temporal Differences. Machine Learning, 3, 9-44. [Google Scholar] [CrossRef
[5] Watkins, C.J.C.H. and Dayan, P. (1992) Technical Note: Q-Learning. Machine Learning, 8, 279-292. [Google Scholar] [CrossRef
[6] Rummery, G.A. and Niranjan, M. (1994) On-Line Q-Learning Using Connectionist Systems. Technical Report, 1-7.
[7] Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., et al. (2015) Human-Level Control through Deep Reinforcement Learning. Nature, 518, 529-533. [Google Scholar] [CrossRef] [PubMed]
[8] Mnih, V., Kavukcuoglu, K., Silver, D., et al. (2013) Playing Atari with Deep Reinforcement Learning.
[9] Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., et al. (2016) Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature, 529, 484-489. [Google Scholar] [CrossRef] [PubMed]
[10] 杜威, 丁世飞. 多智能体强化学习综述[J]. 计算机科学. 2019, 8(46): 1-8.
[11] Sunehag, P., Lever, G., Gruslys, A., Czarnecki, W.M., Zambaldi, V., Jaderberg, M., et al. (2018). Value-Decomposition Networks for Cooperative Multi-Agent Learning Based on Team Reward. International Joint Conference on Autonomous Agents and Multiagent Systems, Stockholm, 10-15 July 2018, 2085-2087.[CrossRef
[12] 李盛祥. 基于强化学习的多智能体协同关键技术及应用研究[D]: [博士学位论文]. 郑州: 战略支援部队信息工程大学, 2021.
[13] Rashid, T., Samvelyan, M., Witt, C.S., et al. (2018) Qmix: Monotonic Value Function Factorization for Deep Multi-Agent Reinforcement Learning. International Conference on Machine Learning, Stockholm, 10-15 July 2018, 4292-4301.
[14] 吴昊霖. 基于协作多智能体强化学习的飞行冲突解脱策略研究[D]: [博士学位论文]. 四川: 四川大学, 2021.