基于零和博弈的部分未知线性离散系统多智能体分布式最优跟踪控制

doi:10.12677/AAM.2022.111022

期刊菜单

基于零和博弈的部分未知线性离散系统多智能体分布式最优跟踪控制
The Multiagent Distributed Optimal Tracking Control of Partially Unknown Linear Discrete Systems Based on Zero-Sum Games

DOI: 10.12677/AAM.2022.111022, PDF, 国家自然科学基金支持
作者: 熊天娇：上海理工大学理学院，上海；王朝立^*：上海理工大学光电信息与计算机工程学院，上海
关键词: 零和博弈；L_2-增益；纳什均衡；线性离散系统；Zero-Sum Game； L_2-Gain； Nash Equilibrium； Linear Discrete Systems

摘要: 本文考虑了具有外部扰动的不确定线性离散系统分布式最优跟踪控制问题。现有的研究要求系统动力学已知且未证明最优解就是纳什均衡解。由于控制策略和干扰之间的竞争关系，该问题首先转变为多智能体零和博弈。本文根据所提出的新性能指标，采用内外循环算法对哈密顿雅可比艾萨克斯(HJI)方程进行迭代求解，并验证了收敛性。此外，它表明该算法得到的最优解是零和博弈的纳什均衡解。本文进一步表明，每当系统不完全已知时，单层神经网络可用于近似实值函数，与现有的三层网络相比，这可以降低计算复杂性。最后，通过仿真验证了该方法的有效性。

Abstract: The paper studies the distributed optimal tracking control problem by considering linear discrete systems with unknown disturbances. The existing research requires that the system dynamics are known and have not proved that the optimal solution is the Nash equilibrium. Such a problem is first transformed into a multiagent zero-sum game due to the competitive situation among inputs and disturbances. According to the proposed new performance index, the internal and external loop algorithm is adopted to solve the Hamilton Jacobi Isaacs (HJI) equations iteratively, and the convergence is also proven. In addition, it shows that the optimal solution obtained by the algorithm is the Nash equilibrium of the zero-sum game. This paper further shows that, whenever the system is not fully known, the single-layer neural network could be used to approximate the real value function, which can reduce the computational complexity compared with the prevalent three-layer networks. Finally, simulations are provided to show the effectiveness of the method.

文章引用：熊天娇, 王朝立. 基于零和博弈的部分未知线性离散系统多智能体分布式最优跟踪控制[J]. 应用数学进展, 2022, 11(1): 158-179. https://doi.org/10.12677/AAM.2022.111022

参考文献

[1]	Mu, S.M., Chu, T.G. and Wang, L. (2005) Coordinated Collective Motion in a Motile Particle Group with a Leader. Physica A: Statistical Mechanics & Its Applications, 351, 211-226. [Google Scholar] [CrossRef]
[2]	Nash, J.F. (1950) Two-Person Cooperative Games. Econometrica, 21, 128-140. [Google Scholar] [CrossRef]
[3]	Nash, J.F. (1951) Non-Cooperative Games. Annals of Mathematics, 54, 286-295. [Google Scholar] [CrossRef]
[4]	Starr, A.W. and Ho. Y.C. (1969) Nonzero-Sum Differential Games. Journal of Optimization Theory and Applications, 3, 184-206. [Google Scholar] [CrossRef]
[5]	Vamvoudakis, K.G. and Lewis, F.L. (2011) Multi-Player Non-Zero-Sum Games: Online Adaptive Learning Solution of Coupled Hamilton-Jacobi Equations. Automatica, 47, 1556-1569. [Google Scholar] [CrossRef]
[6]	Yang, D.S., Pang, Y.H. and Zhou, B.W. (2019) Fault Diagnosis for Energy Internet Using Correlation Processing-Based Convolutional Neural Networks. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 49, 1739-1748. [Google Scholar] [CrossRef]
[7]	Yang, X.F. and Gao, J.W. (2016) Linear-Quadratic Uncertain Differential Game with Application to Resource Extraction Problem. IEEE Transactions on Fuzzy Systems: A Publication of the IEEE Neural Networks Council, 24, 819-826. [Google Scholar] [CrossRef]
[8]	Hong, Y.G., Hu, J.P. and Gao, L.X. (2008) Tracking Control for Multi-Agent Consensus with an Active Leader and Variable Topology. Automatica, 42, 1177-1182. [Google Scholar] [CrossRef]
[9]	Ren, W., Moore, K.L. and Chen, Y.Q. (2006) High-Order and Model Reference Consensus Algorithms in Cooperative Control of Multivehicle Systems. Journal of Dynamic Systems Measurement and Control, 129, 678-688. [Google Scholar] [CrossRef]
[10]	Freiling, G., Jank, G. and Abou-Kandil, H. (2002) On Global Existence of Solutions to Coupled Matrix Riccati Equations in Closed-Loop Nash Games. IEEE Transactions on Automatic Control, 41, 264-269. [Google Scholar] [CrossRef]
[11]	Abu-Khalaf, M., Lewis, F.L. and Huang, J. (2007) Policy Iterations on the Hamilton-Jacobi-Isaacs Equation for H∞ State Feedback Control with Input Saturation. IEEE Transactions on Automatic Control, 51, 1989-1995. [Google Scholar] [CrossRef]
[12]	Lewis, F.L. and Vrabie, D. (2009) Reinforcement Learning and Adaptive Dynamic Programming for Feedback Control. IEEE Circuits & Systems Magazine, 9, 32-50. [Google Scholar] [CrossRef]
[13]	He, H.B., Ni, Z. and Fu. J. (2012) A Three-Network Architecture for On-Line Learning and Optimization Based on Adaptive Dynamic Programming. Neurocomputing, 78, 3-13. [Google Scholar] [CrossRef]
[14]	Dierks, T. and Jagnnathan, S. (2012) Online Optimal Control of Affine Nonlinear Discrete-Time Systems with Unknown Internal Dynamics by Using Timebased Policy Update. IEEE Transactions on Neural Networks & Learning Systems, 23, 1118-1129. [Google Scholar] [CrossRef]
[15]	Wei, L.Q., Wang, F.Y. and Liu, D.R. (2014) Finite-Approximation-Error-Based Discrete-Time Iterative Adaptive Dynamic Programming. IEEE Transactions on Cybernetics, 44, 2820-2833. [Google Scholar] [CrossRef]
[16]	Ni, Z., He, H.B. and Zhao, D.B. (2015) GrDHP: A General Utility Function Representation for Dual Heuristic Dynamic Programming. IEEE Transactions on Neural Networks & Learning Systems, 26, 614-627. [Google Scholar] [CrossRef]
[17]	Wei, Q.L., Liu, D.R. and Lin, H.Q. (2016) Value Iteration Adaptive Dynamic Programming for Optimal Control of Discrete-Time Nonlinear Systems. IEEE Transactions on Cybernetics, 46, 840-853. [Google Scholar] [CrossRef]
[18]	Gao, W.N. and Jiang, Z.P. (2016) Adaptive Dynamic Programming and Adaptive Optimal Output Regulation of Linear Systems. IEEE Transactions on Automatic Control, 61, 4164-4169. [Google Scholar] [CrossRef]
[19]	Zhang, H.G., Liang, H.J. and Wang, Z.S. (2017) Optimal Output Regulation for Heterogeneous Multiagent Systems via Adaptive Dynamic Programming. IEEE Transactions on Neural Networks & Learning Systems, 28, 18-29. [Google Scholar] [CrossRef]
[20]	Yang, Y.L., Wunsch, D. and Yin, Y.X. (2017) Hamiltonian-Driven Adaptive Dynamic Programming for Continuous Nonlinear Dynamical Systems. IEEE Transactions on Neural Networks & Learning Systems, 28, 1929-1940. [Google Scholar] [CrossRef]
[21]	Sun, J.L. and Long, T. (2020) Event-Triggered Distributed Zero-Sum Differential Game for Nonlinear Multi-Agent Systems Using Adaptive Dynamic Programming. ISA Transactions, 110, 39-52.
[22]	罗傲, 肖文彬, 周琪, 等. 基于强化学习的一类具有输入约束非线性系统最优控制[J/OL]. 控制理论与应用, 2021.
[23]	Zhu, Y.H., Zhao, D.B. and Li, X.J. (2017) Iterative Adaptive Dynamic Programming for Solving Unknown Nonlinear Zero-Sum Game Based on Online Data. IEEE Transactions on Neural Networks & Learning Systems, 28, 714-725. [Google Scholar] [CrossRef]
[24]	Yasini, S., Sistani, M.B. and Karimpour, A. (2014) Approximate Dynamic Programming for Two-Player Zero-Sum Game Related to H∞ Control of Unknown Nonlinear Continuous-Time Systems. International Journal of Control, Automation and Systems, 13, 99-109. [Google Scholar] [CrossRef]
[25]	Song, R. and Zhu, L. (2019) Stable Value Iteration for Two-Player Zero-Sum Game of Discrete-Time Nonlinear Systems Based on Adaptive Dynamic Programming. Neurocomputing, 340, 180-195.
[26]	Vamvoudakis, K.G., Safaei, F.R.P. and Hespanha, J.P. (2019) Robust Event-Triggered Output Feedback Learning Algorithm for Voltage Source Inverters with Unknown Load and Parameter Variations. International Journal of Robust and Nonlinear Control, 29, 3502-3517. [Google Scholar] [CrossRef]
[27]	Yang, D.S., Li, T. and Zhang, H.G. (2019) Event-Trigger-Based Robust Control for Nonlinear Constrained-Input Systems Using Reinforcement Learning Method. Neurocomputing, 340, 158-170.
[28]	张正义, 赵学艳. 基于Q学习算法的随机离散时间系统的随机线性二次最优追踪控制[J]. 南京信息工程大学学报, 2020, 13(5): 548-555.
[29]	Abouheaf, M.L., Lewis, F.L. and Vamvoudakis, K.G. (2014) Multi-Agent Discrete-Time Graphical Games and Reinforcement Learning Solutions. Automatica, 50, 3038-3053.
[30]	Yang, N., Xiao, J.W. and Wang, Y.W. (2018) Non-Zero Sum Differential Graphical Game: Cluster Synchronisation for Multi-Agents with Partially Unknown Dynamics. International Journal of Control, 92, 2408-2419. [Google Scholar] [CrossRef]
[31]	Jiang, H., Zhang, H.G. and Han, J. (2018) Iterative Adaptive Dynamic Programming Methods with Neural Network Implementation for Multiplayer Zero-Sum Games. Neurocomputing, 307, 54-60.
[32]	Liu, D.R., Li, H.L. and Wang, D. (2013) Neural-Network-Based Zero-Sum Game for Discrete-Time Nonlinear Systems via Iterative Adaptive Dynamic Programming Algorithm. Neurocomputing, 110, 92-100.
[33]	李传江, 马广富. 最优控制[M]. 北京: 科学出版社, 2011: 216-218.
[34]	吴受章. 最优控制理论与应用[M]. 北京: 机械工业出版社, 2007: 193-194.
[35]	Luy, N.T. (2017) Distributed Cooperative H∞ Optimal Tracking Control of Mimo Nonlinear Multi-Agent Systems in Strict-Feedback Form via Adaptive Dynamic Programming. International Journal of Control, 91, 952-968. [Google Scholar] [CrossRef]
[36]	Jiao, Q., Modares, H. and Xu, S.Y. (2016) Multi-Agent Zero-Sum Differential Graphical Games for Disturbance Rejection in Distributed Control. Automatica, 69, 24-34.
[37]	Vamvoudakis, K.G., Lewis, F.L. and Hudas, G.R. (2012) Multi-Agent Differential Graphical Games: Online Adaptive Learning Solution for Synchronization with Optimality. Automatica, 48, 1598-1611.

为你推荐

友情链接