对带熵的随机线性二次最优控制问题的收敛性证明
The Proof of the Convergence of Stochastic Linear Quadratic Optimal Control Problem with Entropy
摘要: 本文通过矩阵变换将带熵的随机线性二次最优控制问题的解转化为其等价形式后,证明了线性二次方程的二次项系数解的唯一性和迭代式的收敛性,而一次项系数为0,常数项系数只与二次项有关,控制过程的最优概率分布也只与二次项有关。然后用蒙特卡洛随机抽样样本的均值估计期望值,由此设置了算法1,并证明了算法1中的迭代式具有波动性,波动率的大小和随机参数的方差有关,也与蒙特卡洛中的样本数有关,样本数越多,波动对应的方差越小。然后用两个数值案例比较了随机逼近Q-learning算法和蒙特卡洛Q-learning算法,相同迭代次数下,随机逼近Q-learning算法计算时间更少,但误差更大,蒙特卡洛Q-learning算法收敛更快更稳定,并且可以通过增加随机抽取的样本数使误差更小。
Abstract: In this paper, after transforming the solution of the stochastic linear quadratic optimal control problem with entropy into its equivalent form through matrix transformation, we prove the uniqueness of the solution of the quadratic coefficient of the linear quadratic equation and the convergence of the iterative formula, and the result shows that the coefficient of the first term is 0, the coefficient of the constant term is only related to the quadratic term, and the optimal proba-bility distribution of the control process is only related to the quadratic term. Then, the mean value of random sampling samples in Monte Carlo is used to estimate the expected value, thus algorithm 1 is set up, and it is proved that the iterative formula in algorithm 1 has volatility, the volatility is related to the variance of random parameters and the number of samples in Monte Carlo, the more sample number, the smaller the variance of the fluctuation. Then, two numerical cases are used to compare Q-learning algorithm with stochastic approximation and Q-learning algorithm with Monte Carlo. Under the same number of iterations, Q-learning algorithm with stochastic approximation takes less time to compute, but the error is larger. Q-learning algorithm with Monte Carlo converges faster and more stable. Moreover, the error can be reduced by increasing the number of randomly selected samples.
文章引用:舒心. 对带熵的随机线性二次最优控制问题的收敛性证明[J]. 理论数学, 2023, 13(3): 659-668. https://doi.org/10.12677/PM.2023.133071

参考文献

[1] Pronzato, L., Kulcsár, C. and Walter, E. (1996) An Actively Adaptive Control Policy for Linear Models. IEEE Trans-actions on Automatic Control, 41, 855-858. [Google Scholar] [CrossRef
[2] Chen, S., Li, X. and Zhou, X.Y. (1998) Stochastic Linear Quadratic Regulators with Indefinite Control Weight Costs. SIAM Journal on Control and Optimization, 36, 1685-1702. [Google Scholar] [CrossRef
[3] Chen, S. and Zhou, X.Y. (2000) Stochastic Linear Quadratic Regulators with Indefinite Control Weight Costs. II. SIAM Journal on Control and Opti-mization, 39, 1065-1081. [Google Scholar] [CrossRef
[4] Rami, M.A., Moore, J.B. and Zhou, X.Y. (2002) Indefinite Stochastic Linear Quadratic Control and Generalized Differential Riccati Equation. SIAM Journal on Control and Optimization, 40, 1296-1311. [Google Scholar] [CrossRef
[5] Wang, T., Zhang, H. and Luo, Y. (2016) Infinite-Time Sto-chastic Linear Quadratic Optimal Control for Unknown Discrete-Time Systems Using Adaptive Dynamic Programming Approach. Neurocomputing, 171, 379-386. [Google Scholar] [CrossRef
[6] Du, K., Meng, Q. and Zhang, F. (2022) A Q-Learning Algo-rithm for Discrete-Time Linear-Quadratic Control with Random Parameters of Unknown Distribution: Convergence and Stabilization. SIAM Journal on Control and Optimization, 60, 1991-2015. [Google Scholar] [CrossRef
[7] 舒心. 带熵的随机线性二次最优控制问题[J]. 应用数学进展, 2022, 11(12): 8836-8845. [Google Scholar] [CrossRef
[8] Metropolis, N. and Ulam, S. (1949) The Monte Carlo Method. Journal of the American Statistical Association, 44, 335-341. [Google Scholar] [CrossRef] [PubMed]
[9] Harrison, R.L. (2010) Introduction to Monte Carlo Simu-lation. AIP Conference Proceedings, 1204, 17-21. [Google Scholar] [CrossRef] [PubMed]
[10] James, F. (1980) Monte Carlo Theory and Practice. Reports on Progress in Physics, 43, Article No. 1145. [Google Scholar] [CrossRef
[11] Glasserman, P. (2004) Monte Carlo Methods in Financial En-gineering. Springer, New York. [Google Scholar] [CrossRef
[12] Ferrenberg, A.M. and Swendsen, R.H. (1988) New Monte Carlo Technique for Studying Phase Transitions. Physical Review Letters, 61, 2635-2638. [Google Scholar] [CrossRef
[13] Robbins, H. and Monro, S. (1951) A Stochastic Approximation Method. The Annals of Mathematical Statistics, 22, 400-407. [Google Scholar] [CrossRef
[14] Lai, T.L. (2003) Stochastic Approximation. The Annals of Statistics, 31, 391-406. [Google Scholar] [CrossRef
[15] Nemirovski, A., Juditsky, A., Lan, G. and Shapiro, A. (2009) Robust Stochastic Approximation Approach to Stochastic Programming. SIAM Journal on Optimization, 19, 1574-1609. [Google Scholar] [CrossRef
[16] Tsitsiklis, J.N. (1994) Asynchronous Stochastic Approximation and Q-Learning. Machine Learning, 16, 185-202. [Google Scholar] [CrossRef