多种返利机制下做市商问题的强化学习求解
Reinforcement Learning Solution to the Market Maker Problem under Multiple Rebate Mechanisms
摘要: 本文基于Avellaneda-Stoikov做市商模型,构建了一个同时包含常值返利、时间依赖返利和状态依赖返利的统一分析框架,并通过离散化将模型转化为马尔可夫决策过程,采用DDQN、PPO和A2C三种强化学习算法求解。数值结果表明,在无返利的简单环境下,PPO与A2C的收益水平与Avellaneda-Stoikov解析基线较为接近,而DDQN的波动和库存风险相对更高;在常值返利环境中,PPO的净收益整体较高,但启发式策略同样具有较强竞争力;在状态依赖返利环境中,A2C相比启发式策略表现出更强的自适应能力。进一步的敏感性分析说明,返利机制不仅影响做市商的利润来源,也改变了报价、成交与库存控制之间的动态权衡。
Abstract: This paper develops a unified market-making framework based on the Avellaneda-Stoikov model by incorporating constant, time-dependent, and state-dependent rebate mechanisms. After discretization, the problem is formulated as a Markov decision process and solved with DDQN, PPO, and A2C. The numerical results indicate that, in simple no-rebate settings, PPO and A2C achieve profit levels close to the Avellaneda-Stoikov analytical benchmark, while DDQN exhibits relatively higher volatility and inventory risk. Under constant rebate mechanisms, PPO attains the highest net profit on average, although heuristic policies remain competitive. Under state-dependent rebates, A2C shows a clearer adaptive advantage over heuristic baselines. The sensitivity analysis further suggests that rebate mechanisms affect not only the source of profit, but also the dynamic trade-off among quoting behavior, execution, and inventory control.
参考文献
|
[1]
|
Demsetz, H. (1968) The Cost of Transacting. The Quarterly Journal of Economics, 82, 33-53. [Google Scholar] [CrossRef]
|
|
[2]
|
Garman, M.B. (1976) Market Microstructure. Journal of Financial Economics, 3, 257-275. [Google Scholar] [CrossRef]
|
|
[3]
|
Avellaneda, M. and Stoikov, S. (2008) High-Frequency Trading in a Limit Order Book. Quantitative Finance, 8, 217-224. [Google Scholar] [CrossRef]
|
|
[4]
|
Guéant, O., Lehalle, C. and Fernandez-Tapia, J. (2013) Dealing with the Inventory Risk: A Solution to the Market Making Problem. Mathematics and Financial Economics, 7, 477-507. [Google Scholar] [CrossRef]
|
|
[5]
|
Aït-Sahalia, Y. and Brunetti, C. (2020) High Frequency Traders and the Price Process. Journal of Econometrics, 217, 20-45. [Google Scholar] [CrossRef]
|
|
[6]
|
Gasperov, B. and Kostanjcar, Z. (2022) Deep Reinforcement Learning for Market Making under a Hawkes Process-Based Limit Order Book Model. IEEE Control Systems Letters, 6, 2485-2490. [Google Scholar] [CrossRef]
|
|
[7]
|
Gong, S.Q., Liu, S.Q. and Sun, D.D. (2023) Optimal Market Making in the Chinese Stock Market: A Stochastic Control and Scenario Analysis. [Google Scholar] [CrossRef]
|
|
[8]
|
Spooner, T. and Savani, R. (2020) Robust Market Making via Adversarial Reinforcement Learning. https://arxiv.org/abs/2003.01820
|