基于深度强化学习的智能体股票投资组合自动交易模型研究
Research on the Automatic Trading Model of Stock Investment Portfolios Based on Deep Reinforcement Learning Agents
摘要: 随着金融科技的深度渗透,量化投资已成为平衡股票市场风险与收益的核心手段。传统交易模型常面临“预测与决策脱节”的痛点,而深度强化学习(DRL)凭借“智能体与环境实时交互、动态优化策略”的核心特性,为股票投资组合自动化管理提供了新路径。本文以沪深300成分股为研究对象,整合技术指标、文本情感与宏观经济三类数据,构建了A2C (Advantage Actor-Critic)、DDPG (Deep Deterministic Policy Gradient)、PPO (Proximal Policy Optimization)与TD3 (Twin Delayed DDPG)四种DRL算法智能体,并对其性能进行系统对比。通过引入动态风险厌恶系数优化奖励函数,且模拟印花税、佣金等真实交易成本,最终实现了投资组合智能化与自动化管理的高效落地。实证结果表明,PPO算法在测试集上表现最优,其风险收益平衡能力显著优于其他对比DRL算法及传统基线模型。
Abstract: With the deep integration of financial technology, quantitative investment has become a core approach to balancing risks and returns in the stock market. Traditional trading models often suffer from the issue of “disconnection between prediction and decision-making.” In contrast, Deep Reinforcement Learning (DRL), with its core capability of “enabling agents to interact with the environment in real-time and dynamically optimize strategies”, offers a new pathway for the automated management of stock portfolios. This study focuses on the constituent stocks of the CSI 300 Index, integrating three types of data—technical indicators, textual sentiment, and macroeconomic data—to construct four DRL-based agent models: A2C (Advantage Actor-Critic), DDPG (Deep Deterministic Policy Gradient), PPO (Proximal Policy Optimization), and TD3 (Twin Delayed DDPG), and systematically compares their performance. By incorporating a dynamic risk aversion coefficient to optimize the reward function and simulating real-world transaction costs such as stamp duty and commissions, the study achieves an intelligent and automated portfolio management system. Empirical results demonstrate that the PPO algorithm performs best on the test set, with its risk-return balancing capability significantly outperforming other compared DRL algorithms and traditional baseline models.
参考文献
|
[1]
|
白彦锋, 袁贵博. 健全我国直接税体系的系统性对策研究[J]. 税务研究, 2025(9): 22-29.
|
|
[2]
|
董文昊, 刘春林. CEO强制变更对盈余管理的溢出效应[J]. 经济与管理研究, 2025, 46(3): 128-144.
|
|
[3]
|
陆蓉, 张瑞瑞, 闵思凯. 量化交易的市场价值效应——信息优势的作用[J]. 管理世界, 2025, 41(6): 55-97, 157.
|
|
[4]
|
师应来, 訾轩. 基于投资者情绪的动态Copula-小波SVR模型构建与应用[J]. 统计与决策, 2024, 40(16): 140-145.
|
|
[5]
|
王建. 量化交易与黑天鹅现象漫谈: 应对不确定性的策略[J]. 中国信用卡, 2025(7): 79-84.
|
|
[6]
|
林升, 綦科, 魏楷聪, 等. 机器学习在股价预测中的研究综述[J]. 经济师, 2019(3): 71-73, 78.
|
|
[7]
|
张倩玉, 严冬梅, 韩佳彤. 结合深度学习和分解算法的股票价格预测研究[J]. 计算机工程与应用, 2021, 57(5): 56-64.
|