|
[1]
|
Birge, J. and Louveaux, F. (2011) Introduction to Stochastic Programming. Springer Science & Business Media, Heidelberg. [Google Scholar] [CrossRef]
|
|
[2]
|
Kučera, V. (1992) Optimal Control: Linear Quadratic Methods: Brian D. O. Anderson and John B. Moore. Automatica, 28, 1068-1069. [Google Scholar] [CrossRef]
|
|
[3]
|
Sutton, R.S. and Barto, A.G. (2018) Reinforcement Learning: An Introduction. 2nd ed., the MIT Press, Cambridge.
|
|
[4]
|
Basei, M., Guo, X., Hu, A. and Zhang, Y. (2020) Logarithmic Regret for Episodic Continuous-Time Linear-Quadratic Reinforcement Learning over a Finite-Time Horizon. Computation Theory eJournal. [Google Scholar] [CrossRef]
|
|
[5]
|
Dean, S., Mania, H., Matni, N., Recht, B. and Tu, S. (2017) On the Sample Com-plexity of the Linear Quadratic Regulator. Foundations of Computational Mathematics, 20, 633-679. [Google Scholar] [CrossRef]
|
|
[6]
|
Ren, Z., Zhong, A. and Li, N. (2021) LQR with Tracking: A Zeroth-Order Approach and Its Global Convergence. 2021 American Control Conference (ACC), New Orleans, LA, 25-28 May 2021, 2562-2568. [Google Scholar] [CrossRef]
|
|
[7]
|
Bertsekas, D.P. (2011) Approximate Policy Iteration: A Survey and Some New Methods. Journal of Control Theory and Applications, 9, 310-335. [Google Scholar] [CrossRef]
|
|
[8]
|
Mania, H., Guy, A. and Recht, B. (2018) Simple Random Search Provides a Competitive Approach to Reinforcement Learning. arXiv preprint arXiv:1803.07055
|
|
[9]
|
Abbasi-Yadkori, Y., Lazic, N. and Szepesvari, C. (2019) Model-Free Linear Quadratic Control via Reduction to Expert Prediction. The 22nd International Conference on Artificial Intelligence and Statistics, Naha, 16-18 April 2019, 3108-3117.
|
|
[10]
|
Mahdi, I. and Braga-Neto, U.M. (2018) Finite-Horizon lqr Controller for Partially-Observed Boolean Dynamical Systems. Automatica, 95, 172-179. [Google Scholar] [CrossRef]
|
|
[11]
|
Zhang, H. and Li, N. (2022) Data-Driven Policy Iteration Algorithm for Continuous-Time Stochastic Linear-Quadratic Optimal Control Problems. Asian Journal of Control, 26, 481-489. [Google Scholar] [CrossRef]
|
|
[12]
|
Farjadnasab, M. and Babazadeh, M. (2022) Model-Free LQR Design by Q-Function Learning. Automatica, 137, Article ID: 110060. [Google Scholar] [CrossRef]
|
|
[13]
|
Yaghmaie, F.A., Gustafsson, F.K. and Ljung, L. (2023) Linear Quadratic Control Using Model-Free Reinforcement Learning. IEEE Transactions on Automatic Control, 68, 737-752. [Google Scholar] [CrossRef]
|
|
[14]
|
Tu, S. and Recht, B. (2019) The Gap between Model-Based and Model-Free Methods on the Linear Quadratic Regulator: An Asymptotic Viewpoint. Conference on Learning Theory, USA, 9 December 2019, 3036-3083.
|
|
[15]
|
Malik, D., Pananjady, A., Bhatia, K., Khamaru, K., Bartlett, P.L. and Wainwright, M.J. (2018) Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems. Journal of Machine Learning Research, 21, 1-21.
|
|
[16]
|
Fazel, M., Ge, R. Kakade, S.M. and Mesbahi, M. (2018) Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator. International Conference on Machine Learning, Stockholm, 10-15 July 2018, 1467-1476.
|
|
[17]
|
Hambly, B.M., Xu, R. and Yang, H. (2021) Policy Gradient Methods for the Noisy Linear Quadratic Regulator over a Finite Horizon. SIAM Journal on Control and Optimization, 59, 3359-3391. [Google Scholar] [CrossRef]
|
|
[18]
|
Shamir, O. (2017) An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback. The Journal of Machine Learning Research, 18, 1703-1713.
|
|
[19]
|
Bu, J., Mesbahi, A. and Mesbahi, M. (2020) Policy Gradient-Based Algorithms for Continuous-Time Linear Quadratic Control. arXiv: 2006.09178.
|
|
[20]
|
Bertsekas, D.P. (1995) Dynamic Programming and Optimal Control. 3rd Edition, Athena Scientific, Nashua, NH.
|