强化学习视野下带切换的最优执行问题
Optimal Execution with Regime Switching in the Framework of Reinforcement Learning
DOI: 10.12677/AIRR.2025.145117, PDF,   
作者: 蔡佑滔:广东工业大学数学与统计学院,广东 广州
关键词: 双深度 Q 学习最优执行随机价格影响切换Double Deep Q-Learning Optimal Execution Stochastic Price Impact Switching
摘要: 对带切换具有随机价格影响和其他消费者需求压力影响的最优执行问题,提出了一种双深度 Q 学习 (DDQL) 求解方案。首先,构建最优执行模型;其次,进行算法设计,将最优执行模型转化为强化学习语言并进行训练;最后,本文在三种市场条件下对所提出的模型进行验证,发现使用 DDQL 训练出来的策略均优于 TWAP 策略。
Abstract: A Double Deep Q-Learning (DDQL) framework is proposed to address the optimal execution problem under regime switching, incorporating stochastic price impact and demand pressure from other market participants. The approach begins with the con- struction of an optimal execution model. Subsequently, the model is reformulated within the reinforcement learning framework and trained accordingly. Finally, the proposed method is empirically evaluated under three distinct market conditions. The results demonstrate that the strategies derived through DDQL training consistently outperform the Time-Weighted Average Price (TWAP) benchmark.
文章引用:蔡佑滔. 强化学习视野下带切换的最优执行问题[J]. 人工智能与机器人研究, 2025, 14(5): 1247-1259. https://doi.org/10.12677/AIRR.2025.145117

参考文献

[1] Bertsimas, D. and Lo, A.W. (1998) Optimal Control of Execution Costs. Journal of Financial Markets, 1, 1-50. [Google Scholar] [CrossRef
[2] Almgren, R. and Chriss, N. (2001) Optimal Execution of Portfolio Transactions. The Journal of Risk, 3, 5-39. [Google Scholar] [CrossRef
[3] Nevmyvaka, Y., Feng, Y. and Kearns, M. (2006) Reinforcement Learning for Optimized Trade Execution. Proceedings of the 23rd International Conference on Machine learning-ICML’06,
[4] Pittsburgh, 25-29 June 2006, 673-68.[CrossRef
[5] Hendricks, D. and Wilcox, D. (2014) A Reinforcement Learning Extension to the Almgren- Chriss Framework for Optimal Trade Execution. 2014 IEEE Conference on Computational Intelligence for Financial Engineering Economics (CIFEr), London, 27-28 March 2014, 457- 464. [Google Scholar] [CrossRef
[6] Ning, B., Lin, F.H.T. and Jaimungal, S. (2021) Double Deep Q-Learning for Optimal Execu- tion. Applied Mathematical Finance, 28, 361-380. [Google Scholar] [CrossRef
[7] Biais, B., Hillion, P. and Spatt, C. (1995) An Empirical Analysis of the Limit Order Book and the Order Flow in the Paris Bourse. The Journal of Finance, 50, 1655-1689. [Google Scholar] [CrossRef
[8] Ahn, H., Bae, K. and Chan, K. (2001) Limit Orders, Depth, and Volatility: Evidence from the Stock Exchange of Hong Kong. The Journal of Finance, 56, 767-788. [Google Scholar] [CrossRef
[9] Ma, G., Siu, C.C., Zhu, S. and Elliott, R.J. (2020) Optimal Portfolio Execution Problem with Stochastic Price Impact. Automatica, 112, Article 108739. [Google Scholar] [CrossRef
[10] Jin, Z., Yin, G. and Wu, F. (2013) Optimal Reinsurance Strategies in Regime-Switching Jump Diffusion Models: Stochastic Differential Game Formulation and Numerical Methods. Insur- ance: Mathematics and Economics, 53, 733-746. [Google Scholar] [CrossRef
[11] Macrì, A. and Lillo, F. (2024) Reinforcement Learning for Optimal Execution When Liquidity Is Time-Varying. Applied Mathematical Finance, 31, 312-342. [Google Scholar] [CrossRef