机器学习预测股票收益(一)之随机森林模型

前言

本文将使用Python整理1927-2020年所有美国上市公司股票数据。根据历史收益以及交易量,使用随机森林,支持向量机以及神经网络等机器学习方法预测股票收益。最优结果构建的资产组合能获得年均超20%的收益率。


一、导入库和数据

import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn import  metrics
import matplotlib.pyplot as plt
from pprint import pprint
import statsmodels.api as sm
from stargazer.stargazer import Stargazer
file2 =  "crsp_msf_all.csv"
data = pd.read_csv(file2,parse_dates=["date"], index_col="date")


数据来自CRSP数据库,可以看出数据集包含了各种股票数据,本文中只用到股票代码(PERMNO)、收益(RET)、交易量(VOL)。

二、处理数据以及计算特征变量

vol = data["VOL"]
ret = data[["PERMNO","RET","VOL"]]
ret = ret.replace('C',np.nan).replace('B',np.nan)
ret = ret.dropna()
ret ["RET"]= ret["RET"].astype(float)

predictorsname = ["R0","R1","R2","R3","R4","R5","R6","R7","R8","R9","R10","R11","R12",
              "R13","R14","R15","R16","R17","R18","R19","R20","R21","R22","R23","R24"]
#计算历史收益
for i in range(25):
    data[predictorsname[i]]= data.groupby('PERMNO')['RET'].shift(i+1)
data["R-1"] = data.groupby('PERMNO')['RET'].shift(-1)

predictorsname.append("VOL")
obs = data[predictorsname] 
obs["PERMNO"] = data["PERMNO"]  
obs["RET"] = data["RET"]  
obs["R-1"] = data["R-1"]  
obs = obs[["PERMNO","VOL","R-1","RET","R0","R1","R2","R3","R4","R5","R6","R7","R8","R9","R10","R11","R12",
              "R13","R14","R15","R16","R17","R18","R19","R20","R21","R22","R23","R24"]]
obs = obs.replace('C',np.nan).replace('B',np.nan)
obs = obs.dropna()

##归一化处理
def regularit(df):
    newDataFrame = pd.DataFrame
评论 8
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值