利用遗传算法进行高频因子挖掘（二）

最新推荐文章于 2024-04-20 15:12:20 发布

置顶 Allen Chi

最新推荐文章于 2024-04-20 15:12:20 发布

阅读量9.8k

点赞数 13

分类专栏：量化投资文章标签： python 数据挖掘机器学习

本文链接：https://blog.csdn.net/qq_24099909/article/details/107697320

版权

量化投资专栏收录该内容

4 篇文章 38 订阅

订阅专栏

文章目录

自定义函数

自定义函数

参考研报：191125-华泰证券-华泰人工智能系列之二十六：遗传规划在CTA信号挖掘中的应用

1 修改类_Function

参照研报，本章中我们将增加自定义的时间序列函数和TA-Lib函数。相比于GPlearn中对类_Function的定义，这里我们需要额外增加一个回滚周期，于是需要对类进行修改，增加参数is_ts和属性d，分别代表此函数是否为时间序列函数和回滚周期。

由于新增的属性d是有严格要求的，必须为整型，并且不能作为树中的点，沿用之前的类是不行的。这里我们采用的办法是在类_Function中增加一个设置属性d的方法set_d，重设属性d和name。

在每次随机选择函数时，若选到了时间序列函数，则在给定范围内随机选择一个d，初始化一个函数类，调用set_d方法设置属性，同时，以及在计算时增加参数d，这样就解决了上面的问题并且可以在输出时体现出参数。

此外，部分TA-Lib函数指定了固定的参数，那么在生成树的时候需要把它当做叶子点看待，是不需要传入其他参数的，于是我们增加了一个参数params_need为所需的固定参数。

修改后的代码如下：

class _Function(object):

    def __init__(self, function, name, arity, is_ts=False, params_need=None):
        self.function = function
        self.name = name
        self.arity = arity

        # 新增参数
        self.is_ts = is_ts  # bool, 代表此函数是否为时间序列函数，默认为False
        self.d = 0  # int, 时间序列回滚周期，若为时间序列函数则需要重设此参数
        self.params_need = params_need  # list, 部分TA-Lib的方法需要的固定参数及顺序

    def __call__(self, *args):
        if not self.is_ts:
            return self.function(*args)
        else:
            if self.d == 0:
                raise AttributeError("Please reset attribute 'd'")
            else:
                return self.function(*args, self.d)

    # 新增重设参数d的方法
    def set_d(self, d):
        self.d = d
        self.name += '_%d' % self.d

2 自定义时间序列函数

delay: d天以前的x1值

def _ts_delay(x1, d):
    return pd.Series(x1).shift(d).values
ts_delay1 = _Function(function=_ts_delay, name='ts_delay', arity=1, is_ts=True)

delta: 与 d 天以前 x1 值的差值

def _ts_delta(x1, d):
    return x1 - _ts_delay(x1, d)
ts_delta1 = _Function(function=_ts_delta, name='ts_delta', arity=1, is_ts=True)

ts_min: 过去 d 天 x1 值构成的时序数列中最小值

def _ts_min(x1, d):
    return pd.Series(x1).rolling(d, min_periods=int(d / 2)).min()
ts_min1 = _Function(function=_ts_min, name='ts_min', arity=1, is_ts=True)

过去 d 天 x1 值构成的时序数列中最大值

def _ts_max(x1, d):
    return pd.Series(x1).rolling(d, min_periods=int(d / 2)).max()
ts_max1 = _Function(function=_ts_max, name='ts_max', arity=1, is_ts=True)

过去 d 天 x1 值构成的时序数列中最小值出现的位置

def _ts_argmin(x1, d):
    return pd.Series(x1).rolling(d, min_periods=int(d / 2)).apply(lambda x: x.argmin())
ts_argmin1 = _Function(function=_ts_argmin, name='ts_argmin', arity=1, is_ts=True)

过去 d 天 x1 值构成的时序数列中最大值出现的位置

def _ts_argmax(x1, d):
    return pd.Series(x1).rolling(d, min_periods=int(d / 2)).apply(lambda x: x.argmax())
ts_argmax1 = _Function(function=_ts_argmax, name='ts_argmax', arity=1, is_ts=True)

过去 d 天 x1 值构成的时序数列中本截面日 x1 值所处分位数

def _ts_rank(x1, d):
    return pd.Series(x1).rolling(d, min_periods=int(d / 2)).apply(
        lambda x: stats.percentileofscore(x, x[-1]) / 100
    )
ts_rank1 = _Function(function=_ts_rank, name='ts_rank', arity=1, is_ts=True)

过去 d 天 x1 值构成的时序数列之和

def _ts_sum(x1, d):
    return pd.Series(x1).rolling(d, min_periods=int(d / 2)).sum()
ts_sum1 = _Function(function=_ts_sum, name='ts_sum', arity=1, is_ts=True)

过去 d 天 x1 值构成的时序数列的标准差

def _ts_stddev(x1, d):
    return pd.Series(x1).rolling(d, min_periods=int(d / 2)).std()
ts_stddev1 = _Function(function=_ts_stddev, name='ts_stddev', arity=1, is_ts=True)

过去 d 天 x1 值构成的时序数列与 x2 构成的时序数列的相关系数

def _ts_corr(x1, x2, d):
    return pd.Series(x1).rolling(d, min_periods=int(d / 2)).corr(pd.Series(x2))
ts_corr2 = _Function(function=_ts_corr, name='ts_corr', arity=2, is_ts=True)

过去 d 天 x1 值构成的时序数列的变化率的平均值

def _ts_mean_return(x1, d):
    return pd.Series(x1).pct_change().rolling(d, min_periods=int(d / 2)).mean()
ts_mean_return1 = _Function(function=_ts_mean_return, name='ts_mean_return',
                            arity=1, is_ts=True)

3 TA-Lib函数

如前文所说，这些函数分为两类，一类和前一节中的函数一样都是简单的时间序列函数，另一类则是要求了固定的参数的函数，这里我们分开定义。

3.1 非固定参数函数

DEMA: 过去 d 天 x1 值的双移动平均线，属于趋势信号

ts_dema1 = _Function(function=ta.DEMA, name='DEMA', arity=1, is_ts=True)

KAMA: 过去 d 天 x1 值的考夫曼自适应移动平均线，属于趋势信号

ts_kama1 = _Function(function=ta.KAMA, name='KAMA', arity=1, is_ts=True)

MA: 过去 d 天 x1 构成的时序数列的平均值，属于趋势信号

ts_ma1 = _Function(function=ta.MA, name='MA', arity=1, is_ts=True)

MIDPOINT: 过去 d 天 x1 值构成的时序数列的最大值与最小值的平均值

ts_midpoint1 = _Function(function=ta.MIDPOINT, name='MIDPOINT', arity=1, is_ts=True)

BETA: 贝塔系数，过去 d 天 x1 相对 Y 的波动情况，属于统计学信号

ts_beta2 = _Function(function=ta.BETA, name='BETA', arity=2, is_ts=True)

LINEARREG_ANGLE: 过去 d 天 x1 值序列为因变量，序列 1,…,d 为自变量的线性回归角度，属于统计学信号

ts_lr_angle1 = _Function(function=ta.LINEARREG_ANGLE, name='LR_ANGLE',
                         arity=1, is_ts=True)

LINEARREG_INTERCEPT: 过去 d 天 x1 值序列为因变量，序列 1,…,d 为自变量的线性回归截距，属于统计学信号

ts_lr_intercept1: _Function = _Function(function=ta.LINEARREG_INTERCEPT,
                                        name='LR_INTERCEPT', arity=1, is_ts=True)

LINEARREG_SLOPE: 过去 d 天 x1 值序列为因变量，序列 1,…,d 为自变量的线性回归斜率，属于统计学信号

ts_lr_slope1 = _Function(function=ta.LINEARREG_SLOPE, name='LR_SLOPE',
                         arity=1, is_ts=True)

HT_DCPHASE: x1 值的希尔伯特变换-主导循环阶段，属于周期性信号

ts_ht1 = _Function(function=ta.HT_DCPHASE, name='HT', arity=1, is_ts=True)

3.2 固定参数函数

这里用到的固定参数有：high、low、close、volume，由于我们的数据是500ms更新一次的tick数据，且只有盘口数据和当日总成交量、总成交额，于是这里我们用Ask、Bid分别替换high、close，用过去1tick的平均成交价格AvgPrice代表close，并额外计算每tick成交量。

过去 d 天 high 序列的最大值与 low 序列的最小值的平均值

fixed_midprice = _Function(function=ta.MIDPRICE, name='midprice', arity=0, is_ts=True,
                           params_need=['Ask', 'Bid'])

过去 d 天的阿隆震荡指标，属于动量信号

fixed_aroonosc = _Function(function=ta.AROONOSC, name='AROONOSC', arity=0, is_ts=True,
                           params_need=['Ask', 'Bid'])

过去 d 天的威廉指标，表示的是市场属于超买还是超卖状态，属于动量信号

fixed_willr = _Function(function=ta.WILLR, name='WILLR', arity=0, is_ts=True,
                        params_need=['Ask', 'Bid', 'AvgPrice'])

过去 d 天的顺势指标，测量股价是否已超出正常分布范围，属于动量信号

fixed_cci = _Function(function=ta.CCI, name='CCI', arity=0, is_ts=True,
                      params_need=['Ask', 'Bid', 'AvgPrice'])

过去 d 天的平均趋向指数，指标判断盘整、震荡和单边趋势，属于动量信号

fixed_adx = _Function(function=ta.ADX, name='ADX', arity=0, is_ts=True,
                      params_need=['Ask', 'Bid', 'AvgPrice'])

过去 d 天的资金流量指标，反映市场的运行趋势，属于动量信号

fixed_mfi = _Function(function=ta.MFI, name='MFI', arity=0, is_ts=True,
                      params_need=['Ask', 'Bid', 'AvgPrice', 'volume'])

过去 d 天的归一化波动幅度均值，属于波动性信号

fixed_natr = _Function(function=ta.NATR, name='NATR', arity=0, is_ts=True,
                       params_need=['Ask', 'Bid', 'AvgPrice'])

4 定义函数集

除了内置函数的函数集外，我们分开定义普通时间序列函数的函数集和固定参数函数的函数集。

普通时间序列函数集

_ts_function_map = {
    'ts_delay': ts_delay1,
    'ts_delta': ts_delta1,
    'ts_min': ts_min1,
    'ts_max': ts_max1,
    'ts_argmin': ts_argmin1,
    'ts_argmax': ts_argmax1,
    'ts_rank': ts_rank1,
    'ts_stddev': ts_stddev1,
    'ts_corr': ts_corr2,
    'ts_mean_return': ts_mean_return1,

    'DEMA': ts_dema1,
    'KAMA': ts_kama1,
    'MA': ts_ma1,
    'MIDPOINT': ts_midpoint1,
    'BETA': ts_beta2,
    'LR_ANGLE': ts_lr_angle1,
    'LR_INTERCEPT': ts_lr_intercept1,
    'LR_SLOPE': ts_lr_slope1,
    'HT': ts_ht1
}

固定参数函数集

_fixed_function_map = {
    'MIDPRICE': fixed_midprice,
    'AROONOSC': fixed_aroonosc,
    'WILLR': fixed_willr,
    'CCI': fixed_cci,
    'ADX': fixed_adx,
    'MFI': fixed_mfi,
    'NATR': fixed_natr
}

5 总结和展望

本章中我们对类_Functions的定义进行了修改，并参照研报定义了一些函数，接下来我们将对_program文件中的类_Program进行修改，适配我们的时间序列数据以及新定义的函数。

Allen Chi

关注

13
点赞
踩
50

收藏

觉得还不错? 一键收藏
4
评论
利用遗传算法进行高频因子挖掘（二）

文章目录自定义函数1 修改类_Function2 自定义时间序列函数3 TA-Lib函数3.1 非固定参数函数3.2 固定参数函数4 定义函数集5 总结和展望自定义函数参考研报：191125-华泰证券-华泰人工智能系列之二十六：遗传规划在CTA信号挖掘中的应用1 修改类_Function参照研报，本章中我们将增加自定义的时间序列函数和TA-Lib函数。相比于GPlearn中对类_Function的定义，这里我们需要额外增加一个回滚周期，于是需要对类进行修改，增加参数is_ts和属性d，分别代表此
复制链接

扫一扫