【Alphalens】使用Alphalens配合Akshare进行双均线因子分析,附源码及常见问题

Alphalens 是非常著名的一个python因子分析库。但是该库由于目前已经不怎么维护,问题非常多。最新的使用建议使用alphalens-reloaded,地址:stefan-jansen/alphalens-reloaded: Performance analysis of predictive (alpha) stock factors (github.com)

由于该库的demo都是基于国外雅虎财经的接口yfinance。如果使用国内的akshare配合使用会出现一下问题。需要对Alphalens接口非常熟悉。建议阅读其原始接口的注释,特别是get_clean_factor_and_forward_returns方法。

def get_clean_factor_and_forward_returns(factor,
                                         prices,
                                         groupby=None,
                                         binning_by_group=False,
                                         quantiles=5,
                                         bins=None,
                                         periods=(1, 5, 10),
                                         filter_zscore=20,
                                         groupby_labels=None,
                                         max_loss=0.35,
                                         zero_aware=False,
                                         cumulative_returns=True):
    """
    Formats the factor data, pricing data, and group mappings into a DataFrame
    that contains aligned MultiIndex indices of timestamp and asset. The
    returned data will be formatted to be suitable for Alphalens functions.

    It is safe to skip a call to this function and still make use of Alphalens
    functionalities as long as the factor data conforms to the format returned
    from get_clean_factor_and_forward_returns and documented here

    Parameters
    ----------
    factor : pd.Series - MultiIndex
        A MultiIndex Series indexed by timestamp (level 0) and asset
        (level 1), containing the values for a single alpha factor.
        ::
            -----------------------------------
                date    |    asset   |
            -----------------------------------
                        |   AAPL     |   0.5
                        -----------------------
                        |   BA       |  -1.1
                        -----------------------
            2014-01-01  |   CMG      |   1.7
                        -----------------------
                        |   DAL      |  -0.1
                        -----------------------
                        |   LULU     |   2.7
                        -----------------------

    prices : pd.DataFrame
        A wide form Pandas DataFrame indexed by timestamp with assets
        in the columns.
        Pricing data must span the factor analysis time period plus an
        additional buffer window that is greater than the maximum number
        of expected periods in the forward returns calculations.
        It is important to pass the correct pricing data in depending on
        what time of period your signal was generated so to avoid lookahead
        bias, or  delayed calculations.
        'Prices' must contain at least an entry for each timestamp/asset
        combination in 'factor'. This entry should reflect the buy price
        for the assets and usually it is the next available price after the
        factor is computed but it can also be a later price if the factor is
        meant to be traded later (e.g. if the factor is computed at market
        open but traded 1 hour after market open the price information should
        be 1 hour after market open).
        'Prices' must also contain entries for timestamps following each
        timestamp/asset combination in 'factor', as many more timestamps
        as the maximum value in 'periods'. The asset price after 'period'
        timestamps will be considered the sell price for that asset when
        computing 'period' forward returns.
        ::
            ----------------------------------------------------
                        | AAPL |  BA  |  CMG  |  DAL  |  LULU  |
            ----------------------------------------------------
               Date     |      |      |       |       |        |
            ----------------------------------------------------
            2014-01-01  |605.12| 24.58|  11.72| 54.43 |  37.14 |
            ----------------------------------------------------
            2014-01-02  |604.35| 22.23|  12.21| 52.78 |  33.63 |
            ----------------------------------------------------
            2014-01-03  |607.94| 21.68|  14.36| 53.94 |  29.37 |
            ----------------------------------------------------

    groupby : pd.Series - MultiIndex or dict
        Either A MultiIndex Series indexed by date and asset,
        containing the period wise group codes for each asset, or
        a dict of asset to group mappings. If a dict is passed,
        it is assumed that group mappings are unchanged for the
        entire time period of the passed factor data.
    binning_by_group : bool
        If True, compute quantile buckets separately for each group.
        This is useful when the factor values range vary considerably
        across gorups so that it is wise to make the binning group relative.
        You should probably enable this if the factor is intended
        to be analyzed for a group neutral portfolio
    quantiles : int or sequence[float]
        Number of equal-sized quantile buckets to use in factor bucketing.
        Alternately sequence of quantiles, allowing non-equal-sized buckets
        e.g. [0, .10, .5, .90, 1.] or [.05, .5, .95]
        Only one of 'quantiles' or 'bins' can be not-None
    bins : int or sequence[float]
        Number of equal-width (valuewise) bins to use in factor bucketing.
        Alternately sequence of bin edges allowing for non-uniform bin width
        e.g. [-4, -2, -0.5, 0, 10]
        Chooses the buckets to be evenly spaced according to the values
        themselves. Useful when the factor contains discrete values.
        Only one of 'quantiles' or 'bins' can be not-None
    periods : sequence[int]
        periods to compute forward returns on.
    filter_zscore : int or float, optional
        Sets forward returns greater than X standard deviations
        from the the mean to nan. Set it to 'None' to avoid filtering.
        Caution: this outlier filtering incorporates lookahead bias.
    groupby_labels : dict
        A dictionary keyed by group code with values corresponding
        to the display name for each group.
    max_loss : float, optional
        Maximum percentage (0.00 to 1.00) of factor data dropping allowed,
        computed comparing the number of items in the input factor index and
        the number of items in the output DataFrame index.
        Factor data can be partially dropped due to being flawed itself
        (e.g. NaNs), not having provided enough price data to compute
        forward returns for all factor values, or because it is not possible
        to perform binning.
        Set max_loss=0 to avoid Exceptions suppression.
    zero_aware : bool, optional
        If True, compute quantile buckets separately for positive and negative
        signal values. This is useful if your signal is centered and zero is
        the separation between long and short signals, respectively.
    cumulative_returns : bool, optional
        If True, forward returns columns will contain cumulative returns.
        Setting this to False is useful if you want to analyze how predictive
        a factor is for a single forward day.

    Returns
    -------
    merged_data : pd.DataFrame - MultiIndex
        A MultiIndex Series indexed by date (level 0) and asset (level 1),
        containing the values for a single alpha factor, forward returns for
        each period, the factor quantile/bin that factor value belongs to, and
        (optionally) the group the asset belongs to.
        - forward returns column names follow  the format accepted by
          pd.Timedelta (e.g. '1D', '30m', '3h15m', '1D1h', etc)
        - 'date' index freq property (merged_data.index.levels[0].freq) will be
          set to a trading calendar (pandas DateOffset) inferred from the input
          data (see infer_trading_calendar for more details). This is currently
          used only in cumulative returns computation
        ::
           -------------------------------------------------------------------
                      |       | 1D  | 5D  | 10D  |factor|group|factor_quantile
           -------------------------------------------------------------------
               date   | asset |     |     |      |      |     |
           -------------------------------------------------------------------
                      | AAPL  | 0.09|-0.01|-0.079|  0.5 |  G1 |      3
                      --------------------------------------------------------
                      | BA    | 0.02| 0.06| 0.020| -1.1 |  G2 |      5
                      --------------------------------------------------------
           2014-01-01 | CMG   | 0.03| 0.09| 0.036|  1.7 |  G2 |      1
                      --------------------------------------------------------
                      | DAL   |-0.02|-0.06|-0.029| -0.1 |  G3 |      5
                      --------------------------------------------------------
                      | LULU  |-0.03| 0.05|-0.009|  2.7 |  G1 |      2
                      --------------------------------------------------------

    See Also
    --------
    utils.get_clean_factor
        For use when forward returns are already available.
    """
    forward_returns = compute_forward_returns(
        factor,
        prices,
        periods,
        filter_zscore,
        cumulative_returns,
    )

    factor_data = get_clean_factor(factor, forward_returns, groupby=groupby,
                                   groupby_labels=groupby_labels,
                                   quantiles=quantiles, bins=bins,
                                   binning_by_group=binning_by_group,
                                   max_loss=max_loss, zero_aware=zero_aware)

    return factor_data

源码

使用Akshare获取a股600519数据,然后使用alphalens-reloaded进行最基本的因子分析,因子使用5日均线与10日均线的交叉,代码如下:

import warnings
warnings.filterwarnings('ignore')
import pandas as pd
import alphalens
import seaborn as sns
import akshare as ak
from pytz import timezone
# %matplotlib inline
sns.set_style('white')
# pd.set_option('display.max_columns', None)
# pd.set_option('display.max_rows', None)

# 使用 akshare 的 stock_zh_a_hist 函数
df = ak.stock_zh_a_hist(symbol='600519', period="daily", start_date='20200101', end_date='20201231', adjust="qfq")
# 调整 DataFrame 列名
df.rename(columns={
'日期': 'date',
'开盘': 'open',
'收盘': 'close',
'最高': 'high',
'最低': 'low',
'成交量': 'volume'
}, inplace=True)
df['asset'] = '600519'
# 计算开盘价和收盘价之差
# df['factor'] = df['close']
df['ma5'] = df['close'].rolling(window=5).mean().fillna(0)
df['ma10'] = df['close'].rolling(window=10).mean().fillna(0)
df['factor'] = df['ma5']-df['ma10']
df = df.iloc[20:]
df.head(30)

# 使用dff,不影响原来的df
dff = df
dff['date'] = pd.to_datetime(dff['date'])
dff = dff.set_index(['date', 'asset'])
dff.index = dff.index.set_levels([dff.index.levels[0].tz_localize('UTC'), dff.index.levels[1]])
factor = dff['factor']

# factor.head()
# print(factor)

df['date'] = pd.to_datetime(df['date']).dt.tz_localize('UTC')  # convert date column to datetime format with UTC timezone
df.set_index(['date', 'asset'], inplace=True)

# select 'close' column to create the prices dataframe
prices = df['close'].unstack('asset')
prices.head()
print(prices.index.tz)
print(factor.index.levels[0].tz)
# print(prices)

# 现在对factor和prices进行对齐
# factor, prices = factor.align(prices, join='inner', axis=0)


factor_data = alphalens.utils.get_clean_factor_and_forward_returns( factor,
                                                                    prices,
                                                                    groupby=None,
                                                                    binning_by_group=False,
                                                                    quantiles=2,
                                                                    bins=None,
                                                                    periods=(1, 5, 10),
                                                                    filter_zscore=20,
                                                                    groupby_labels=None,
                                                                    max_loss=0.35,
                                                                    zero_aware=True,
                                                                    cumulative_returns=True,
                                                                   )
# factor_data.head()

alphalens.tears.create_full_tear_sheet(factor_data,long_short=False)

结果如图:

常见错误

  1. AttributeError: ‘Index’ object has no attribute ‘tz’
    时区问题,国外的数据默认都带了时区,国内的tushare、akshare需要自己把时区加上,可以参考上述源码的处理。

  2. MaxLossExceededError: max_loss (35.0%) exceeded 100.0%, consider increasing it.
    get_clean_factor_and_forward_returns函数默认的max_loss为35.0%,自己也可以配置,最开始使用默认的quantiles=5会出现这个问题,可以把入参quantiles改为2。该因子可分为正数和负数两类。

  3. Inferred frequency None from passed values does not conform to passed frequency C
    频率问题,解决频率问题可以将数据同步一下,可能是由于部分NaN值或者将factor与prices值对齐。

如有问题欢迎评论区留言或者私信。

EA均线交易是一种基于技术分析的交易策略,通过计算不同周期均线之间的交叉点来确定买入和卖出的时机。 EA均线交易的源码可以使用MQL编程语言来编写。以下是一个示例源码的简单描述: 首先,我们需要定义两个均线的周期,例如短期均线和长期均线。这可以通过设置两个变量来实现。 接下来,我们需要定义买入和卖出的条件。例如,当短期均线从下方穿过长期均线时,可以触发买入信号。当短期均线从上方穿过长期均线时,可以触发卖出信号。这些条件可以通过一些逻辑运算和条件语句来实现。 然后,我们需要定义买入和卖出的执行操作。例如,当买入信号触发时,可以执行市价买入交易。当卖出信号触发时,可以执行市价卖出交易。这些操作可以通过调用交易函数来实现。 最后,我们需要设置止损和止盈的条件。例如,可以设置止损为买入价的一定比例,止盈为买入价的一定比例。这些条件可以通过逻辑运算和条件语句来实现。 需要注意的是,以上只是一个简单的示例,实际的源码可能会更加复杂,需要根据实际情况进行调整和优化。另外,编写EA均线交易的源码还需要考虑到其他因素,例如资金管理、交易量等。 总的来说,EA均线交易的源码可以通过MQL编程语言来实现,通过定义均线周期、买入和卖出条件、执行操作和止损止盈条件等来确定交易的时机和条件。编写源码时需要考虑到实际情况和其他因素,并进行适当的调整和优化。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值