前言
比赛链接:Jane Street Market Prediction
这里介绍几种比赛中使用到,但是最终提交没有用到的方法。
将income 作为loss(优化目标)
直接利用评价指标进行优化在不少比赛中都能取得很好的效果,但是对于不稳定的评价指标来说,训练过程可能会极不稳定,难以进行优化,因此无法取得很好的效果。
这里采取 交叉熵 训练,income 微调的方法。线下利用income 微调时可以获得比交叉熵训练更高的收益上限,但是过度微调仍会最终使得utility score下降。代码 框架参考了 这里 。
读取数据,定义优化目标
为了应用income作为优化目标,我们定义了y_train2 作为另一组标签。这里没有用utility score 作为优化目标的原因是,utility score需要以天为单位进行计算,这样的话,数据就没有进行充分shuffle。而且utility score十分不稳定,小batch训练可能效果可能很差。实际上,不进行shuffle和进行shuffle相比线上会带来不小的差距。shuffle后模型没办法学到天内的数据关系,但是抗噪性可能更强了,反而取得了更高的线上分数。
from tensorflow.keras.layers import Input, Dense, BatchNormalization, Dropout, Concatenate, Lambda, GaussianNoise, Activation
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.losses import BinaryCrossentropy
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.layers.experimental.preprocessing import Normalization
import tensorflow as tf
import numpy as np
import pandas as pd
from tqdm import tqdm
from random import choices
SEED = 1111
tf.random.set_seed(SEED)
np.random.seed(SEED)
train = pd.read_csv('../input/jane-street-market-prediction/train.csv')
train = train.query('date > 85').reset_index(drop = True)
train = train[train['weight'] != 0]
train.fillna(train.mean(),inplace=True)
train['action'] = ((train['resp'].values) > 0).astype(int)
features = [c for c in train.columns if "feature" in c]
f_mean = np.mean(train[features[1:]].values,axis=0)
resp_cols = ['resp_1', 'resp_2', 'resp_3', 'resp', 'resp_4']
X_train = train.loc[:, train.columns.str.contains('feature')]
y_train = np.stack([(train[c] > 0).astype('int') for c in resp_cols]).T
y_train2 = np.stack([train[c].values*train['weight'].values for c in resp_cols]).T
模型训练与微调
def create_mlp(
num_columns, num_labels, hidden_units, dropout_rates, label_smoothing, learning_rate
):
inp = tf.keras.layers.Input(shape=(num_columns,))
x = tf.keras.layers.BatchNormalization()(inp)
x = tf.keras.layers.Dropout(dropout_rates[0])(x)
for i in range(len(hidden_units)):
x = tf.keras.layers.Dense(hidden_units[i])(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Activation(tf.keras.activations.swish)(x)
x = tf.keras.layers.Dropout(dropout_rates[i + 1])(x)
x = tf.keras.layers.Dense(num_labels)(x)
out = tf.keras.layers.Activation("sigmoid")(x)
model = tf.keras.models.Model(inputs=inp, outputs=out)
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate),
loss=tf.keras.losses.BinaryCrossentropy(label_smoothing=label_smoothing),
metrics=tf.keras.metrics.AUC(name="AUC"),
)
return model
batch_size = 5000
hidden_units = [150, 150, 150]
dropout_rates = [0.2, 0.2, 0.2, 0.2]
label_smoothing = 1e-2
learning_rate = 1e-3
clf = create_mlp(
len(features), 5, hidden_units, dropout_rates, label_smoothing, learning_rate
)
# 正常训练200轮
clf.fit(X_train, y_train, epochs=200, batch_size=5000, shuffle=True)
def score_loss(y_true, y_pred):
score = -tf.reduce_sum(tf.cast(y_true, 'float32')[:,3] * tf.cast(y_pred, 'float32')[:,3])
return score
# 微调10轮
clf.compile(loss=score_loss,
optimizer=tf.keras.optimizers.Adam(1e-4),
metrics='AUC',
)
history = clf.fit(X_train, y_train2, epochs=10, batch_size=batch_size, shuffle=True)
clf.save_weights('weight.h5')
提交结果
这一模型最终没有带来线上分数的提升,所以没有再加以考虑。
th = 0.5000
f = np.median
import janestreet
env = janestreet.make_env()
for (test_df, pred_df) in tqdm(env.iter_test()):
if test_df['weight'].item() > 0:
x_tt = test_df.loc[:, features].values
if np.isnan(x_tt[:, 1:].sum()):
x_tt[:, 1:] = np.nan_to_num(x_tt[:, 1:]) + np.isnan(x_tt[:, 1:]) * f_mean
pred = clf(x_tt, training = False).numpy()[0][2]
pred_df.action = np.where(pred >= th, 1, 0).astype(int)
else:
pred_df.action = 0
env.predict(pred_df)
多目标学习
这里定义了多个目标,加上运用generator,每个batch只输入一个整天的数据,以学到日内数据间的查别。resp和weight分别作为多目标加入了网络,作为正则项。date则是把500的时间区间分为了5段:0-99, 100-199, 200-299, 300-399, 400-499 。这个模型最终线上只取得了6606.753分(目前前排一大半队伍的公榜分数就在这个区间)。
以下是模型定义代码。
TRAINING = False
def create_model():
inp = tf.keras.layers.Input(shape = (130, ))
x = tf.keras.layers.Dropout(0.2)(inp)
x = tf.keras.layers.Dense(160, activation='linear', use_bias=False, kernel_initializer='he_uniform')(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Activation('swish')(x)
x = tf.keras.layers.Dropout(0.2)(x)
x = tf.keras.layers.Dense(160, activation='linear', use_bias=False, kernel_initializer='he_uniform')(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Activation('swish')(x)
x = tf.keras.layers.Dropout(0.2)(x)
x = tf.keras.layers.Dense(160, activation='linear', use_bias=False, kernel_initializer='he_uniform')(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Activation('swish', name='feature')(x)
x = tf.keras.layers.Dropout(0.2)(x)
outs = layers.Dense(5, activation='sigmoid', name='resp')(x)
outs2 = layers.Dense(5, activation='softmax', name='date')(x)
outs3 = layers.Dense(1, activation='relu', name='weight')(x)
outs4 = layers.Dense(5, activation='linear', name='resp_values')(x)
model_nn = tf.keras.models.Model(inputs = inp, outputs = (outs, outs2, outs3, outs4))
optimizer = AdaBeliefOptimizer(learning_rate=0.001, epsilon=1e-14, rectify=False)
model_nn.compile(loss={'resp': tf.keras.losses.BinaryCrossentropy(label_smoothing = 0.01), 'date': 'categorical_crossentropy', 'weight': 'mse', 'resp_values': 'mse'},
optimizer=optimizer,
loss_weights={'resp': 1., 'date': 0.05, 'weight': 0.002, 'resp_values': 100},
metrics={'resp':'AUC', 'date':'acc'},
)
return model_nn
Topline整理
最近金融市场受到美债冲击,进入了比较动荡的行情,估计下一次rerun会出现大洗牌。
AE+MLP (rank10)
最初大家都在用的groupkfold模型,这里的改动是修改了超参。由于做了local validation,线上分数也不低,所以被保留了下来作为我们的最终模型之一。
https://www.kaggle.com/c/jane-street-market-prediction/discussion/224348
Current 17th solution: Ensembles of deep (49 layer) MLPs
by Martin BB
作者用到了一个很深的MLP,加上了极大的batch。我也使用过相同性质的模型,这种很深的模型最终比较容易欠拟合,训练预测速度都很慢(tflite可以解决预测效率问题),极大的batch又会进一步加剧欠拟合的程度。作者利用了公榜和线下数据做交叉验证,最后融合了8个种子。
https://www.kaggle.com/c/jane-street-market-prediction/discussion/224713
mixup augmentation: a way to learn from trades with weight=0 (rank4)
codefluence
作者提供了代码链接,有兴趣的人可以参考一下。虽然作者认为mixup利用到了weight=0的交易,但是个人觉得它是为网络提供了适当的欠拟合。
https://www.kaggle.com/c/jane-street-market-prediction/discussion/224333
My (perhaps) over-complicated LSTM solution
by kire
文中作者提到了一些比较独特的想法…
https://www.kaggle.com/c/jane-street-market-prediction/discussion/224634
Current 62nd place: Couple more tricks, and how I kinda sorta cheated
by Paul Fornia
https://www.kaggle.com/c/jane-street-market-prediction/discussion/224079
Current 41st Place - Solution Overview & Code
by dmitryvyudin
https://www.kaggle.com/c/jane-street-market-prediction/discussion/224029