|
[1]
|
Kalman, R.E. (1960) Contributions to the Theory of Optimal Control. Boletín de la Sociedad Matemática Mexicana, 5, 102-119.
|
|
[2]
|
Wonham, W.M. (1968) On a Matrix Riccati Equation of Stochastic Control. SIAM Journal on Control, 6, 681-697. [Google Scholar] [CrossRef]
|
|
[3]
|
Wonham, W.M. (1967) Optimal Stationary Control of a Linear System with State-Dependent Noise. SIAM Journal on Control, 5, 486-500. [Google Scholar] [CrossRef]
|
|
[4]
|
Bismut, J.-M. (1976) Linear Quadratic Optimal Stochastic Control with Random Coefficients. SIAM Journal on Control and Optimization, 14, 419-444. [Google Scholar] [CrossRef]
|
|
[5]
|
Pronzato, L., Kulcsár, C. and Walter, E. (1996) An Actively Adaptive Control Policy for Linear Models. IEEE Transactions on Automatic Control, 41, 855-858. [Google Scholar] [CrossRef]
|
|
[6]
|
Chen, S., Li, X. and Zhou, X.Y. (1998) Stochastic Linear Quadratic Regu-lators with Indefinite Control Weight Costs. SIAM Journal on Control and Optimization, 36, 1685-1702. [Google Scholar] [CrossRef]
|
|
[7]
|
Chen, S. and Zhou, X.Y.U. (2000) Stochastic Linear Quadratic Regulators with Indefinite Control Weight Costs. II. SIAM Journal on Control and Optimization, 39, 1065-1081. [Google Scholar] [CrossRef]
|
|
[8]
|
Rami, M.A., Moore, J.B. and Zhou, X.Y. (2002) Indefinite Stochastic Linear Quadratic Control and Generalized Differential Riccati Equation. SIAM Journal on Control and Op-timization, 40, 1296-1311. [Google Scholar] [CrossRef]
|
|
[9]
|
Wang, T., Zhang, H. and Luo, Y. (2016) Infinite-Time Sto-chastic Linear Quadratic Optimal Control for Unknown Discrete-Time Systems Using Adaptive Dynamic Programming Approach. Neurocomputing, 171, 379-386. [Google Scholar] [CrossRef]
|
|
[10]
|
Du, K., Meng, Q. and Zhang, F. (2022) A Q-Learning Algo-rithm for Discrete-Time Linear-Quadratic Control with Random Parameters of Unknown Distribution: Convergence and Stabilization. SIAM Journal on Control and Optimization, 60, 1991-2015. [Google Scholar] [CrossRef]
|
|
[11]
|
Ziebart, B.D., Maas, A.L., Bagnell, J.A., et al. (2008) Maximum En-tropy Inverse Reinforcement Learning. Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, 8, 1433-1438.
|
|
[12]
|
Boularias, A., Kober, J. and Peters, J. (2011) Relative Entropy Inverse Reinforcement Learning. Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, 11-13 April 2011, 182-189.
|
|
[13]
|
Haarnoja, T., Zhou, A., Abbeel, P., et al. (2018) Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. The 35th International Conference on Machine Learning, Stockholm, 10-15 July 2018, 1861-1870.
|
|
[14]
|
Haarnoja, T., Tang, H., Abbeel, P., et al. (2017) Reinforcement Learning with Deep Energy-Based Policies. The 34th International Conference on Machine Learning, Sydney, 6-11 August 2017, 1352-1361.
|
|
[15]
|
Zhao, R., Sun, X. and Tresp, V. (2019) Maximum Entropy-Regularized Multi-Goal Reinforcement Learning. The 36th International Conference on Machine Learning, Long Beach, 10-15 June 2019, 7553-7562.
|
|
[16]
|
Wang, H., Zariphopoulou, T. and Zhou, X.Y. (2020) Reinforcement Learning in Continuous Time and Space: A Stochastic Control Approach. Journal of Machine Learning Research, 21, 1-34.
|
|
[17]
|
Wang, H. and Zhou, X.Y. (2020) Continuous-Time Mean-Variance Portfolio Selection: A Reinforcement Learning Framework. Mathematical Finance, 30, 1273-1308. [Google Scholar] [CrossRef]
|
|
[18]
|
Bertsekas, D. (2019) Reinforcement Learning and Optimal Control. Athena Scientific, Nashua.
|
|
[19]
|
Watkins, C.J.C.H. (1989) Learning from Delayed Rewards. King’s College London, London.
|