基于Actor-Critic强化学习的投资与消费问题
Investment and Consumption Problems Based on Actor-Critic Reinforcement Learning
DOI: 10.12677/orf.2025.152079, PDF,   
作者: 刘峻均, 徐海燕, 卢相刚:广东工业大学数学与统计学院,广东 广州
关键词: 消费投资制度转换强化学习梯度下降Consumption Investment Regime-Switching Reinforcement Learning Gradient Descent
摘要: 本文研究了基于Actor-Critic强化学习的最优资产与消费问题。为了描述个体对退休后实际消费水平较低的现象,我们假设个体在退休后的最低消费水平和养老金水平较低,金融市场的资产价格由马尔可夫链调节,考虑通胀因素、习惯消费水平,建立状态转换的财富模型。利用动态规划原理得到了Hamilton-Jacobi-Bellman(HJB)方程。由于扩散过程和状态切换,几乎不可能得到一个封闭形式的解。我们设计出一种基于Actor-Critic强化学习框架下的数值算法来解决最优控制问题,通过对财富过程、优化函数的离散化和对值函数、控制函数的神经网络参数化,采用策略梯度下降算法来改进控制函数,而对于值函数,采用一种TD误差方法来更新。最后是对该优化问题的数值结果展示。
Abstract: This paper investigates the optimal asset and consumption problem based on Actor-Critic reinforcement learning. To describe the phenomenon of relatively low actual consumption levels after retirement, we assume that individuals have lower minimum consumption levels and pension levels after retirement. The asset prices in the financial market are regulated by a Markov chain, and we consider inflation factors and habitual consumption levels to establish a wealth model with state transitions. By applying the principle of dynamic programming, we derive the Hamilton-Jacobi-Bellman (HJB) equation. Due to the diffusion process and state switching, it is nearly impossible to obtain a closed-form solution. We design a numerical algorithm based on the Actor-Critic reinforcement learning framework to solve the optimal control problem. By discretizing the wealth process and the optimization function, and parameterizing the value function and control function using neural networks, we use a gradient descent algorithm to improve the control function. For the value function, we use a TD error method to update it. Finally, we present the numerical results of the optimization problem.
文章引用:刘峻均, 徐海燕, 卢相刚. 基于Actor-Critic强化学习的投资与消费问题[J]. 运筹与模糊学, 2025, 15(2): 227-236. https://doi.org/10.12677/orf.2025.152079

参考文献

[1] Merton, R.C. (1969) Lifetime Portfolio Selection under Uncertainty: The Continuous-Time Case. The Review of Economics and Statistics, 51, 247-257. [Google Scholar] [CrossRef
[2] Ferreira, M., Pinheiro, D. and Pinheiro, S. (2023) Optimal Consumption, Investment and Life Insurance Selection under Robust Utilities. International Journal of Financial Engineering, 10, Article ID: 2350016. [Google Scholar] [CrossRef
[3] Tao, C., Rong, X. and Zhao, H. (2023) Stochastic Control with Inhomogeneous Regime Switching: Application to Consumption and Investment with Unemployment and Reemployment. Journal of Mathematical Economics, 107, Article ID: 102849. [Google Scholar] [CrossRef
[4] Wang, H., Wang, N., Xu, L., Hu, S. and Yan, X. (2022) Household Investment-Consumption-Insurance Policies under the Age-Dependent Risk Preferences. International Journal of Control, 96, 2542-2554. [Google Scholar] [CrossRef
[5] Pollak, R.A. (1970) Habit Formation and Dynamic Demand Functions. Journal of Political Economy, 78, 745-763. [Google Scholar] [CrossRef
[6] Ryder, H.E. and Heal, G.M. (1973) Optimal Growth with Intertemporally Dependent Preferences. The Review of Economic Studies, 40, 1-31. [Google Scholar] [CrossRef
[7] Curatola, G. (2017) Optimal Portfolio Choice with Loss Aversion over Consumption. The Quarterly Review of Economics and Finance, 66, 345-358. [Google Scholar] [CrossRef
[8] van Bilsen, S., Laeven, R.J.A. and Nijman, T.E. (2020) Consumption and Portfolio Choice under Loss Aversion and Endogenous Updating of the Reference Level. Management Science, 66, 3927-3955. [Google Scholar] [CrossRef
[9] He, L., Liang, Z., Song, Y. and Ye, Q. (2022) Optimal Asset Allocation, Consumption and Retirement Time with the Variation in Habitual Persistence. Insurance: Mathematics and Economics, 102, 188-202. [Google Scholar] [CrossRef
[10] Wang, H., Zariphopoulou, T. and Zhou, X.Y. (2020) Reinforcement Learning in Continuous Time and Space: A Stochastic Control Approach. Journal of Machine Learning Research, 21, 1-34.
[11] Jia, Y. and Zhou, X. (2021) Policy Gradient and Actor-Critic Learning in Continuous Time and Space: Theory and Algorithms. Journal of Machine Learning Research, 23, 1-50.
[12] Zhou, M., Han, J. and Lu, J. (2021) Actor-Critic Method for High Dimensional Static Hamilton-Jacobi-Bellman Partial Differential Equations Based on Neural Networks. SIAM Journal on Scientific Computing, 43, A4043-A4066. [Google Scholar] [CrossRef
[13] Wang, Z., Bapst, V., Heess, N., et al. (2016) Sample Efficient Actor-Critic with Experience Replay. arXiv: 1611.01224.