- 无标题文档
查看论文信息

中文题名:

 基于集成学习算法的量化交易策略研究    

姓名:

 于化隆    

保密级别:

 公开    

论文语种:

 中文    

学科代码:

 071201    

学科专业:

 统计学    

学生类型:

 学士    

学位:

 理学学士    

学位年度:

 2020    

学校:

 北京师范大学    

校区:

 北京校区培养    

学院:

 统计学院 ; 国民核算研究院    

第一导师姓名:

 童行伟    

第一导师单位:

 北京师范大学统计学院    

提交日期:

 2020-06-06    

答辩日期:

 2020-06-06    

外文题名:

 Research On Quantitative Trading Strategy Based On Ensemble Learning Algorithm    

中文关键词:

 量化投资 ; Bagging ; AdaBoost ; XGBoost    

外文关键词:

 Quantitative investment ; Bagging ; AdaBoost ; XGBoost    

中文摘要:

量化投资依靠计算机和数学的相关知识建立投资模型,能够科学地处理大量信息,如今已经被许多国内外投资机构所采用。本文尝试使用集成学习的方法,与量化多因子选股模型相结合,构建有效的投资组合。

本文获取了2012年1月到2019年12月全部A股的月度数据,挑选了规模因子、动量因子等九大类因子共67个因子指标作为备选因子,以当期因子数据与下期收益率构建样本集。使用Bagging、AdaBoost以及XGBoost三种集成学习方法,建立预测股票涨跌的分类模型,将2012年至2015年共4年的数据作为训练集,进行因子筛选、参数寻优。将2016年1月到2019年12月的数据作为回测集,分析模型的选股效果。基于集成学习的多因子选股模型是动态的选股模型,每月月初使用过去六个月的因子数据进行训练,选取预测上涨概率最高的50只个股作为当月投资组合,持有一个月后卖出,如此循环。回测结果显示,以Logistic为基分类器的Bagging、AdaBoost方法可以跑赢沪深300指数,其中AdaBoost算法下的选股模型获得了62.22%的总收益率,12.45%的年化收益率,远高于同期沪深300指数1.75%的涨幅。另一方面,集成算法选股效果要明显好于单个Logistic模型,这说明集成策略是有效的。而使用决策树作为基分类器的XGBoost策略收益很不理想,这可能与决策树容易发生过拟合有关。在本文的最后,使用AdaBoost算法下的选股策略与沪深300指数对冲,在不影响收益率的同时降低了风险,夏普比率从0.61提升到0.72。

外文摘要:

Quantitative investment relies on computer and mathematical knowledge to establish investment model, which can deal with a lot of information scientifically. Now it has been adopted by many domestic and foreign investment institutions. This paper attempts to use the ensemble learning method, combined with the quantitative multi-factor stock selection model, to build an effective portfolio.           

This paper obtains the monthly data of all A shares from January 2012 to December 2019, selects nine categories of factors, including scale factor and momentum factor, as the candidate factors, and constructs the sample set based on the current factor data and the next period yield. Three ensemble learning methods, Bagging, AdaBoost and XGBoost, are used to establish a classification model for predicting the rise and fall of stocks. Data from 2012 to 2015 are used as training sets for factor selection and parameter optimization. Take the data from January 2016 to December 2019 as the back-test set to analyze the performance of stock selection model. The multi-factor stock selection model based on ensemble learning is a dynamic model, which uses the factor data of the past six months to train at the beginning of every month. Select the 50 stocks with the highest predicted rise probability as the current month's portfolio, sell them after holding for one month, and so on. The back-test results show that Bagging and AdaBoost methods based on logistic classifier can outperform CSI 300 index. Among them, the stock selection model based on AdaBoost algorithm achieves a total yield of 62.22% and an annualized yield of 12.45%, which is much higher than the 1.75% increase of CSI 300 index in the same period. On the other hand, the stock selection effect of the integration algorithm is better than that of the single logistic model, which shows that the integration strategy is effective. However, the XGBoost strategy using decision tree as base classifier didn’t work, which may be related to the fact that decision tree is prone to over fitting. At the end of this paper, we use AdaBoost's stock selection strategy and CSI 300 index hedging to reduce the risk while not affecting the yield, and the sharp ratio rises from 0.61 to 0.72.

参考文献总数:

 20    

插图总数:

 5    

插表总数:

 8    

馆藏号:

 本071201/20010    

开放日期:

 2021-06-06    

无标题文档

   建议浏览器: 谷歌 360请用极速模式,双核浏览器请用极速模式