使用GBDT-SVM多层次模型优化多因子选股系统
Optimizing Multi-Factor Stock Selection System Using GBDT-SVM Multi-Level Model
DOI: 10.12677/SA.2019.81021, PDF,  被引量   
作者: 孟庆晏*:华南理工大学数学学院,广东 广州
关键词: 量化投资多因子模型GBDTSVM Quantitative Investment Multi-Factor Model GBDT SVM
摘要: 在量化投资领域中,多因子选股模型凭借稳定性高、资金容纳量大等优势被A股市场的广大专业投资者接受和使用。但近年来,模型的同质化愈发严重,基于多因子模型的投资难以获取可观的收益率。本文提出了基于大量因子的GBDT-SVM多层次选股模型,希望使用机器学习技术对因子选取和因子权重动态调整方面进行优化,提高多因子模型对股票超额收益的获取能力。之后,使用2013年至2017年中国A股市场数据进行实证研究,并与经典多因子模型和其改进模型进行比较。研究结果表明,GBDT-SVM多层次选股模型具有更高的预测准确性,历史回溯测试中获得了更高的收益率和夏普比。
Abstract: In the field of quantitative investment, the multi-factor model is widely accepted and used by investors in A-share market because of its high stability and high capital capacity. But in recent years, model homogeneity has been getting worse and investment based on multi-factor model can hardly obtain a considerable rate of return. In this work, we presents GBDT-SVM multi-level model based on big factor database, hoping to promote the multi-factor model’s ability to acquire excess return in stock investment by optimizing factor selection and factor weight dynamic adjustment using machine learning techniques. Then, we conduct empirical research using China’s A-share market data from 2013 to 2017 and compare the model with the classical multi-factor model and its improved version. The research results show that the GBDT-SVM multi-level stock selection model has higher prediction accuracy and gains higher yields and Sharp ratio in historical backtesting.
文章引用:孟庆晏. 使用GBDT-SVM多层次模型优化多因子选股系统[J]. 统计学与应用, 2019, 8(1): 184-192. https://doi.org/10.12677/SA.2019.81021

参考文献

[1] 赵胜民, 闫红蕾, 张凯. Fama-French五因子模型比三因子模型更胜一筹吗——来自中国A股市场的经验证据[J].南开经济研究, 2016(2): 41-59.
[2] 陈荣达, 虞欢欢. 基于启发式算法的支持向量机选股模型[J]. 系统工程, 2014, 32(2): 40-48.
[3] He, X.R., et al. (2014) Practical Lessons from Predicting Clicks on Ads at Facebook. Proceedings of the 8th International Workshop on Data Mining for Online Advertising, ACM, New York, 24-27 August 2014, 1-9. [Google Scholar] [CrossRef
[4] 李斌, 林彦, 唐闻轩. ML-TEA: 一套基于机器学习和技术分析的量化投资算法[J]. 系统工程理论与实践, 2017, 37(5): 1089-1100.
[5] 李文星, 李俊琪. 基于多因子选股的半监督核聚类算法改进研究[J]. 统计与信息论坛, 2018, 33(3): 30-36.
[6] 吕凯晨, 闫宏飞, 陈翀. 基于沪深300成分股的量化投资策略研究[J]. 广西师范大学学报(自然科学版), 2019, 37(1): 1-12.
[7] 谢合亮, 胡迪. 多因子量化模型在投资组合中的应用——基于LASSO与Elastic Net的比较研究[J]. 统计与信息论坛, 2017, 32(10): 36-42.
[8] Friedman, J.H. (2001) Greedy Function Approximation: A Gradient Boosting Machine. Annals of Statistics, 29, 1189-1232. [Google Scholar] [CrossRef
[9] 姚旭, 王晓丹, 张玉玺,等. 特征选择方法综述[J]. 控制与决策, 2012, 27(2): 161-166.
[10] 黄卿, 谢合亮. 机器学习方法在股指期货预测中的应用研究——基于BP神经网络、SVM和XGBoost的比较分析[J]. 数学的实践与认识, 2018, 48(8): 297-307.
[11] Vapnik, V.N. (1997) The Nature of Statistical Learning Theory. IEEE Transactions on Neural Networks, 8, 1564-1564. [Google Scholar] [CrossRef
[12] 张奇, 胡蓝艺, 王珏. 基于Logit与SVM的银行业信用风险预警模型研究[J]. 系统工程理论与实践, 2015, 35(7): 1784-1790.
[13] Harris, T. (2015) Credit Scoring Using the Clustered Support Vector Ma-chine. Expert Systems with Applications, 42, 741-750. [Google Scholar] [CrossRef
[14] 干伟明, 张涤新. 基于价值投资的多因子定价模型在中国资本市场的实证研究[J]. 经济经纬, 2018, 35(4): 136-140.