根据趋势拟合的量化分析方法,是对股票价格历史数据进行曲线拟合,从而预测出未来几天的股价。在本文所示的程序中,用fndays表示所用历史数据的天数,pndays表示预测未来的天数。例如我们可以用过去10天的价格预测未来3天的股票价格。
1、数据准备
本文程序中用到两个数据:
(1)20200206.npy:存储了4278支股票代码。
(2)A股2010-2020K线数据:存储了4278支股票的历史K线数据。
股票历史数据获取方式在文末给出。
2、曲线拟合
曲线拟合是对历史数据拟合成一条曲线,实际效果如下图所示:
3、参数优化
本策略中的可变参数包括历史天数fndays、预测天数pndays和拟合阶数order。本文所示程序中会从4278支股票各随机取出一段数据进行拟合预测,然后对全部拟合结果进行评价总结。通过对这三个参数不断迭代,优化程序会自动筛选出效果最好的参数。
4、全部代码
# -*- coding: utf-8 -*-
"""
Created on Wed Mar 11 22:43:25 2020
@author: yehx
"""
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.optimize import leastsq
def fit_leastq(t, p):
length = len(p)
y = 0
for i in range(length):
y += p[i] * np.power(t, length-1-i)
return y
def fit_leastq_residuals(p, t, y):
return fit_leastq(t, p) - y
def y_generate(coeff, x):
length = len(coeff)
y = 0
for i in range(length):
y += coeff[i] * np.power(x, length-1-i)
return y
#计算曲线拟合结果
#x为输入待拟合的一维矩阵,fndays为x的长度,pndays为预测的未来天数
#n为拟合的阶数,n取值越大拟合程度越高,过拟合风险也越高
#返回y_fit:拟合后的结果,拟合结果是否正确的标签flag
def fit_compute(x_data, y_data, fndays, pndays, n=3):
flag = 0
initals = list(0.01*np.ones(n))
r = leastsq(fit_leastq_residuals, initals, args = (range(fndays), x_data))
y_fit = y_generate(r[0], range(fndays+pndays))
price_chage_target = y_data[-1] - x_data[-1]
price_chage_fit = y_fit[-1] - x_data[-1]
if price_chage_target * price_chage_fit > 0:
flag = 1
return y_fit, flag
#单一拟合结果作图
#输入x为待拟合的历史数据,y_target为待预测的真实数据,y_fit为拟合的全部数据
def single_plot(x, y_target, y_fit, fndays, pndays):
plt.rcParams['font.sans-serif'] = ['SimHei'] # 用来正常显示中文标签
plt.rcParams['axes.unicode_minus'] = False # 用来正常显示负号
plt.plot(range(fndays), x, marker='.', color='blue', linewidth=1.0, label='拟合数据')
plt.plot(range(fndays, fndays+pndays), y_target, marker='.', color='green', linewidth=1.0, label='目标数据')
plt.plot(range(fndays+pndays), y_fit, marker='.', color='red', linewidth=1.0, label='预测数据')
plt.legend()
plt.show()
#参数迭代优化
def para_opti():
#股票名称
stock_set = list(np.load("../20200206.npy"))
fndays_list = list(range(5, 205, 5))
pndays_list = list(range(1, 11))
order_list = list(range(2, 16))
correct_number_list = []
parameter_list = []
for fndays in fndays_list:
for pndays in pndays_list:
x_data = []
y_data = []
for i in range(len(stock_set)):
#获取股票K线数据
df = pd.read_csv("../../Data/A股2010-2020K线数据/"+stock_set[i]+".csv")
#获取收盘价数据
close_price = df.iloc[:,2].tolist()
total_num = len(close_price)
if total_num < fndays + pndays -1:
continue
x_start = np.random.randint(0, total_num-fndays-pndays, 1)[0]
x_data.append(close_price[x_start:(x_start+fndays)])
y_data.append(close_price[(x_start+fndays):(x_start+fndays+pndays)])
x_data = np.array(x_data)
y_data = np.array(y_data)
for order in order_list:
correct_number = 0
if order > fndays:
continue
for j in range(x_data.shape[0]):
y_fit, flag = fit_compute(x_data[j], y_data[j], fndays, pndays, n=order)
# single_plot(x_data[j], y_data[j], y_fit, fndays, pndays)
correct_number += flag
correct_number_list.append(correct_number)
parameter_list.append([fndays, pndays, order])
np.save('correct_number_list.npy', np.array(correct_number_list))
np.save('parameter_number_list.npy', np.array(parameter_list))
print('[INFO] fndays: {}/200, pndays: {}/10, order: {}/15, max correct: {}/{}'.format(
fndays, pndays, order, max(correct_number_list), len(stock_set)))
print('[INFO] best parameters: '+ str(
parameter_list[correct_number_list.index(max(correct_number_list))]))
if __name__ == '__main__':
para_opti()
5、数据获取方式
(1)20200206.npy:关注"量化之窗"公众号,并输入“gpdm”。
(2)A股2010-2020K线数据:关注"量化之窗"公众号,并输入“kxsj”。
如有疑问,请在文章下方留言。