Python量化交易学习笔记(42)——深度学习挖短线股2

上篇文章介绍了深度学习挖短线股的数据预处理部分,本文将介绍模型训练的内容。

模型训练

模型训练过程主要参考Keras的官方示例Structured data classification from scratch(https://keras.io/examples/structured_data/structured_data_classification_from_scratch/),该示例展示了对结构化的数据进行分类,分别处理了特征为字符串、整形、浮点型的数据。对于当前我们所选用的股票训练特征而言,我们只需要处理浮点型特征。

首先,判断预处理后的数据是否存在,如果存在则直接读入,否则进行预处理计算:

for stk_code in stk_list:
    print('processing {}...'.format(stk_code))
    # 判断是否已经经过预处理(文件是否存在)
    data_file = './baostock/data_pre/{}.csv'.format(stk_code)
    if os.path.exists(data_file):
        df = pd.read_csv(data_file)
    else:
        df = pd.read_csv('./baostock/data_ext/{}.csv'.format(stk_code))
        df = df[df['date'] <= '2017-12-31']
        df = data_preprocessing(df, stk_code, FEATURE_N)

然后,依次获取特征的维度,分割训练集及验证集,将数据转为Dataset,并设置batch的大小。

    # 特征维度
    ft_num = df.shape[1] - 1
    # 分割训练集与验证集
    val_df = df.sample(frac=0.2, random_state=1337)
    train_df = df.drop(val_df.index)
    print(
        "Using %d samples for training and %d for validation"
        % (len(train_df), len(val_df))
    )
    # 生成tf.data.Dataset
    train_ds = dataframe_to_dataset(train_df)
    val_ds = dataframe_to_dataset(val_df)
    # 打包
    train_ds = train_ds.batch(32)
    val_ds = val_ds.batch(32)

在经过数据预处理后,我们得到的特征数据为220维(每日特征22维*10日),这些特征数据均为浮点型,参考Keras官方示例,我们调用方法encode_numerical_feature,将数据正则化为均值为0,标准差为1。

# 浮点型特征数据正则化
def encode_numerical_feature(feature, name, dataset):
    # Create a Normalization layer for our feature
    normalizer = Normalization()
    # Prepare a Dataset that only yields our feature
    feature_ds = dataset.map(lambda x, y: x[name])
    feature_ds = feature_ds.map(lambda x: tf.expand_dims(x, -1))
    # Learn the statistics of the data
    normalizer.adapt(feature_ds)
    # Normalize the input feature
    encoded_feature = normalizer(feature)
    return encoded_feature
    # 特征处理
    all_inputs = []
    ft_list = []
    for i in range(ft_num):
        name = '{}'.format(i)
        ki = keras.Input(shape = (1, ), name = name)
        all_inputs.append(ki)
        ft_list.append(encode_numerical_feature(ki, name, train_ds))

接着,我们对每只股票创建模型并进行训练,将训练的结果保存到本地。这里使用了最简单的全链接模型,各层节点均较少,以节省计算时间。即便使用了简单的模型,对2600多只股票进行训练,也花费2天多的时间。训练得到的每个模型大小约5MB。代码如下:

    # 创建模型
    all_features = layers.concatenate(ft_list)
    x = layers.Dense(128, activation="relu")(all_features)
    x = layers.Dense(32, activation="relu")(x)
    x = layers.Dropout(0.5)(x)
    output = layers.Dense(1, activation="sigmoid")(x)
    model = keras.Model(all_inputs, output)
    model.compile("adam", "binary_crossentropy", metrics=["accuracy"])
    # 训练模型
    model.fit(train_ds, epochs=50, validation_data=val_ds)
    model.save('./model/{}'.format(stk_code))

最后,需要且一定要做的工作是,清理每次循环内Keras训练单只股票模型所占用的内存。如果不添加下面这行代码的话,随着训练的进行,程序会把内存刷爆,导致程序崩溃。

    # 清理内存
    backend.clear_session()

以上就完成了使用Keras对股票进行训练的过程,完整代码见文末。后续将继续介绍使用训练得到的模型做预测以及进行量化回测的过程。

import tensorflow as tf
import numpy as np
import pandas as pd
import os
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.layers.experimental.preprocessing import Normalization
from tensorflow.keras.layers.experimental.preprocessing import CategoryEncoding
from tensorflow.keras.layers.experimental.preprocessing import StringLookup
from tensorflow.keras import backend

FEATURE_N = 10

# 浮点型特征数据正则化
def encode_numerical_feature(feature, name, dataset):
    # Create a Normalization layer for our feature
    normalizer = Normalization()
    # Prepare a Dataset that only yields our feature
    feature_ds = dataset.map(lambda x, y: x[name])
    feature_ds = feature_ds.map(lambda x: tf.expand_dims(x, -1))
    # Learn the statistics of the data
    normalizer.adapt(feature_ds)
    # Normalize the input feature
    encoded_feature = normalizer(feature)
    return encoded_feature

# 预处理,将n行数据作为输入特征
def data_preprocessing(df, stk_code, n):
    df = df.copy()
    # 删除无效数据列,保留特征数据
    ft_df = df.drop(columns = ['date', 'buy'])
    # 返回值
    out_df = pd.DataFrame()
    # 生成新特征数据
    for i in range(n, df.shape[0]):
        # 取n行数据
        part_df = ft_df.iloc[i - n : i]
        # 将n行合并为一行
        new_ft_df = pd.DataFrame(part_df.values.reshape(1, -1))
        out_df = out_df.append(new_ft_df)
    out_df['target'] = df.iloc[n:df.shape[0]]['buy'].values
    out_df = out_df.reset_index(drop = True)
    out_df.to_csv('./baostock/data_pre/{}.csv'.format(stk_code), index = False)
    return out_df

def dataframe_to_dataset(dataframe):
    dataframe = dataframe.copy()
    labels = dataframe.pop("target")
    ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), labels))
    ds = ds.shuffle(buffer_size=len(dataframe))
    return ds

stk_code_file = './stk_data/dp_stock_list.csv'
stk_list = pd.read_csv(stk_code_file)['code'].tolist()
start_tag = False
for stk_code in stk_list:
    print('processing {}...'.format(stk_code))
    # 判断是否已经经过预处理(文件是否存在)
    data_file = './baostock/data_pre/{}.csv'.format(stk_code)
    if os.path.exists(data_file):
        df = pd.read_csv(data_file)
    else:
        df = pd.read_csv('./baostock/data_ext/{}.csv'.format(stk_code))
        df = df[df['date'] <= '2017-12-31']
        df = data_preprocessing(df, stk_code, FEATURE_N)
    # 特征维度
    ft_num = df.shape[1] - 1
    # 分割训练集与验证集
    val_df = df.sample(frac=0.2, random_state=1337)
    train_df = df.drop(val_df.index)
    print(
        "Using %d samples for training and %d for validation"
        % (len(train_df), len(val_df))
    )
    # 生成tf.data.Dataset
    train_ds = dataframe_to_dataset(train_df)
    val_ds = dataframe_to_dataset(val_df)
    # 打包
    train_ds = train_ds.batch(32)
    val_ds = val_ds.batch(32)
    # 特征处理
    all_inputs = []
    ft_list = []
    for i in range(ft_num):
        name = '{}'.format(i)
        ki = keras.Input(shape = (1, ), name = name)
        all_inputs.append(ki)
        ft_list.append(encode_numerical_feature(ki, name, train_ds))
    # 创建模型
    all_features = layers.concatenate(ft_list)
    x = layers.Dense(128, activation="relu")(all_features)
    x = layers.Dense(32, activation="relu")(x)
    x = layers.Dropout(0.5)(x)
    output = layers.Dense(1, activation="sigmoid")(x)
    model = keras.Model(all_inputs, output)
    model.compile("adam", "binary_crossentropy", metrics=["accuracy"])
    # 训练模型
    model.fit(train_ds, epochs=50, validation_data=val_ds)
    model.save('./model/{}'.format(stk_code))
    # 清理内存
    backend.clear_session()

博客内容只用于交流学习,不构成投资建议,盈亏自负!

个人博客:http://coderx.com.cn/(优先更新)
项目最新代码:https://gitee.com/sl/quant_from_scratch
欢迎大家转发、留言。有微信群用于学习交流,感兴趣的读者请扫码加微信!
如果认为博客对您有帮助,可以扫码进行捐赠,感谢!

微信二维码微信捐赠二维码
在这里插入图片描述在这里插入图片描述
  • 2
    点赞
  • 14
    收藏
    觉得还不错? 一键收藏
  • 3
    评论
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值