带熵的随机线性二次最优控制问题
Linear Quadratic Optimal Control Problem with Entropy
摘要: 本文研究了随机线性二次最优控制问题,对带有随机参数的无限时间内的离散时间线性二次最优控制问题,我们不考虑控制过程本身的最优解而是求解控制过程的概率分布,并用熵来度量这个随机概率分布的探索水平。经计算得到控制过程的最优概率分布服从高斯分布,再利用概率分布可求得线性二次型最优控制问题值函数的各项系数矩阵的迭代式。在值迭代的基础上使用Q-learning算法求解各项系数值的平稳解。最后选择两个数值算例证明了Q-learning算法的有效性,并比较了加熵和不加熵时的算法效果,结果表明熵的运用可以使算法收敛更快更稳定。
Abstract: This paper studies the stochastic linear quadratic optimal control problem. For the discrete time linear quadratic optimal control problem with random parameters in infinite time, we do not con-sider the optimal solution of the control process itself, but solve the probability distribution of the control process, and use entropy to measure the exploration level of this stochastic probability distribution. The calculation results show that the optimal probability distribution of the control process obeys the Gaussian distribution. By using the probability distribution, the iterative for-mulas of the coefficients of the linear quadratic optimal control problem value function can be ob-tained. According to the value iteration, Q-learning algorithm is used to solve the stationary solution of each coefficient value. Finally, two numerical examples are selected to illustrate the effectiveness of Q-learning algorithm, and the effect of the algorithm with and without entropy is compared. The results show that the application of entropy can make the algorithm convergence faster and more stable.
文章引用:舒心. 带熵的随机线性二次最优控制问题[J]. 应用数学进展, 2022, 11(12): 8836-8845. https://doi.org/10.12677/AAM.2022.1112931

参考文献

[1] Kalman, R.E. (1960) Contributions to the Theory of Optimal Control. Boletín de la Sociedad Matemática Mexicana, 5, 102-119.
[2] Wonham, W.M. (1968) On a Matrix Riccati Equation of Stochastic Control. SIAM Journal on Control, 6, 681-697. [Google Scholar] [CrossRef
[3] Wonham, W.M. (1967) Optimal Stationary Control of a Linear System with State-Dependent Noise. SIAM Journal on Control, 5, 486-500. [Google Scholar] [CrossRef
[4] Bismut, J.-M. (1976) Linear Quadratic Optimal Stochastic Control with Random Coefficients. SIAM Journal on Control and Optimization, 14, 419-444. [Google Scholar] [CrossRef
[5] Pronzato, L., Kulcsár, C. and Walter, E. (1996) An Actively Adaptive Control Policy for Linear Models. IEEE Transactions on Automatic Control, 41, 855-858. [Google Scholar] [CrossRef
[6] Chen, S., Li, X. and Zhou, X.Y. (1998) Stochastic Linear Quadratic Regu-lators with Indefinite Control Weight Costs. SIAM Journal on Control and Optimization, 36, 1685-1702. [Google Scholar] [CrossRef
[7] Chen, S. and Zhou, X.Y.U. (2000) Stochastic Linear Quadratic Regulators with Indefinite Control Weight Costs. II. SIAM Journal on Control and Optimization, 39, 1065-1081. [Google Scholar] [CrossRef
[8] Rami, M.A., Moore, J.B. and Zhou, X.Y. (2002) Indefinite Stochastic Linear Quadratic Control and Generalized Differential Riccati Equation. SIAM Journal on Control and Op-timization, 40, 1296-1311. [Google Scholar] [CrossRef
[9] Wang, T., Zhang, H. and Luo, Y. (2016) Infinite-Time Sto-chastic Linear Quadratic Optimal Control for Unknown Discrete-Time Systems Using Adaptive Dynamic Programming Approach. Neurocomputing, 171, 379-386. [Google Scholar] [CrossRef
[10] Du, K., Meng, Q. and Zhang, F. (2022) A Q-Learning Algo-rithm for Discrete-Time Linear-Quadratic Control with Random Parameters of Unknown Distribution: Convergence and Stabilization. SIAM Journal on Control and Optimization, 60, 1991-2015. [Google Scholar] [CrossRef
[11] Ziebart, B.D., Maas, A.L., Bagnell, J.A., et al. (2008) Maximum En-tropy Inverse Reinforcement Learning. Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, 8, 1433-1438.
[12] Boularias, A., Kober, J. and Peters, J. (2011) Relative Entropy Inverse Reinforcement Learning. Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, 11-13 April 2011, 182-189.
[13] Haarnoja, T., Zhou, A., Abbeel, P., et al. (2018) Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. The 35th International Conference on Machine Learning, Stockholm, 10-15 July 2018, 1861-1870.
[14] Haarnoja, T., Tang, H., Abbeel, P., et al. (2017) Reinforcement Learning with Deep Energy-Based Policies. The 34th International Conference on Machine Learning, Sydney, 6-11 August 2017, 1352-1361.
[15] Zhao, R., Sun, X. and Tresp, V. (2019) Maximum Entropy-Regularized Multi-Goal Reinforcement Learning. The 36th International Conference on Machine Learning, Long Beach, 10-15 June 2019, 7553-7562.
[16] Wang, H., Zariphopoulou, T. and Zhou, X.Y. (2020) Reinforcement Learning in Continuous Time and Space: A Stochastic Control Approach. Journal of Machine Learning Research, 21, 1-34.
[17] Wang, H. and Zhou, X.Y. (2020) Continuous-Time Mean-Variance Portfolio Selection: A Reinforcement Learning Framework. Mathematical Finance, 30, 1273-1308. [Google Scholar] [CrossRef
[18] Bertsekas, D. (2019) Reinforcement Learning and Optimal Control. Athena Scientific, Nashua.
[19] Watkins, C.J.C.H. (1989) Learning from Delayed Rewards. King’s College London, London.