首先alphalens的数据格式:
factor: MultiIndex(用stack()方法来转换)
prices: DataFrame
#转换成MultiIndex
factor = alpha_mom.stack()
print (factor.tail())
datetime
2017-11-20 15:00:00 601857.XSHG 1.022616
601881.XSHG 0.744411
601901.XSHG 0.893478
601985.XSHG 0.993412
601988.XSHG 0.971698
dtype: float64
# 股票池价格的Dataframe
prices = PN.minor_xs('close')
print (prices.tail())
600000.XSHG 600016.XSHG 600028.XSHG 600029.XSHG \
datetime
2017-11-14 15:00:00 118.12 125.93 12.06 16.00
2017-11-15 15:00:00 118.12 124.74 11.82 16.04
2017-11-16 15:00:00 116.16 123.54 11.76 16.29
2017-11-17 15:00:00 119.81 127.42 11.92 16.97
2017-11-20 15:00:00 120.47 128.17 11.92 17.05
600030.XSHG 600036.XSHG 600048.XSHG 600050.XSHG \
datetime
2017-11-14 15:00:00 69.27 111.81 199.75 9.49
2017-11-15 15:00:00 69.04 111.25 204.52 9.68
2017-11-16 15:00:00 68.05 112.13 218.27 9.61
2017-11-17 15:00:00 69.88 117.24 224.00 9.63
2017-11-20 15:00:00 67.71 121.82 224.19 9.80
600100.XSHG 600104.XSHG ... 601766.XSHG \
datetime ...
2017-11-14 15:00:00 178.62 204.03 ... 12.10
2017-11-15 15:00:00 176.35 202.78 ... 12.07
2017-11-16 15:00:00 174.24 200.97 ... 11.77
2017-11-17 15:00:00 165.92 207.21 ... 12.11
2017-11-20 15:00:00 170.61 206.46 ... 12.14
601788.XSHG 601800.XSHG 601818.XSHG 601857.XSHG \
datetime
2017-11-14 15:00:00 17.28 17.39 5.13 10.63
2017-11-15 15:00:00 17.25 17.34 5.12 10.37
2017-11-16 15:00:00 17.04 16.91 5.11 10.28
2017-11-17 15:00:00 17.30 17.04 5.21 10.33
2017-11-20 15:00:00 17.18 16.79 5.24 10.40
601881.XSHG 601901.XSHG 601985.XSHG 601988.XSHG \
datetime
2017-11-14 15:00:00 13.15 8.63 7.80 6.08
2017-11-15 15:00:00 13.03 8.49 7.79 6.07
2017-11-16 15:00:00 12.76 8.28 7.54 6.02
2017-11-17 15:00:00 12.30 8.11 7.63 6.14
2017-11-20 15:00:00 12.32 8.22 7.54 6.18
601989.XSHG
datetime
2017-11-14 15:00:00 10.64
2017-11-15 15:00:00 10.51
2017-11-16 15:00:00 10.49
2017-11-17 15:00:00 10.14
2017-11-20 15:00:00 10.25
[5 rows x 49 columns]
#输入Alphalen所需要的数据格式
import alphalens
factor_data = alphalens.utils.get_clean_factor_and_forward_returns(factor, prices, quantiles=5)
print (factor_data.head())
1 5 10 factor \
date asset
2017-03-07 15:00:00 600000.XSHG -0.001197 -0.010349 -0.024974 1.008018
600016.XSHG -0.005597 -0.015598 -0.034555 0.985728
600028.XSHG 0.003578 -0.016100 0.007156 1.021938
600029.XSHG -0.003912 0.010172 0.000782 1.097938
600030.XSHG -0.006045 -0.006045 -0.013999 1.016659
factor_quantile
date asset
2017-03-07 15:00:00 600000.XSHG 2
600016.XSHG 1
600028.XSHG 3
600029.XSHG 5
600030.XSHG 3
如何不同Quantiles期望收益与标准差?
mean_return_by_q, std_err_by_q = alphalens.performance.mean_return_by_quantile(factor_data, by_date=True)
print(mean_return_by_q.head())
print(std_err_by_q.head())
1 5 10
factor_quantile date
1 2017-03-07 15:00:00 0.006782 0.003821 0.006060
2017-03-08 15:00:00 0.002207 0.000536 -0.005845
2017-03-09 15:00:00 0.000176 0.001881 0.012697
2017-03-10 15:00:00 0.001894 0.004035 0.006478
2017-03-13 15:00:00 0.000316 0.009381 0.011278
1 5 10
factor_quantile date
1 2017-03-07 15:00:00 0.008181 0.005817 0.011047
2017-03-08 15:00:00 0.001643 0.005422 0.012947
2017-03-09 15:00:00 0.002841 0.004721 0.012215
2017-03-10 15:00:00 0.002748 0.003273 0.013972
2017-03-13 15:00:00 0.001233 0.006354 0.011653
如何将不同收益曲线可视化?
1.持有不同天数的收益曲线
2.累积收益曲线
import matplotlib.pyplot as plt
alphalens.plotting.plot_cumulative_returns_by_quantile(mean_return_by_q, 10)
plt.show()
什么是信息系数?
度量变量的预测值与实际值之间的关系的相关值。信息系数是用来评估金融分析师预测技能的一种表现方法。
系数在-1到1之间,越大表示正相关程度强。标准是mean(IC)>0.02
什么是spearman相关系数?
其中d为秩次差。
因此IC值是代表因子排序与收益排序的相关性。
什么是秩次差?
A = [1,3,5,7,9]
B = [3,2,4,5,1]
A的排序是1,2,3,4,5
B的排序是3,2,4,5,1
d为排序相减
# IC值例子
ic = alphalens.performance.factor_information_coefficient(factor_data)
# print (ic)
alphalens.plotting.plot_ic_hist(ic)
mean_monthly_ic = alphalens.performance.mean_information_coefficient(factor_data, by_time='M')
# print mean_monthly_ic.mean()
alphalens.plotting.plot_monthly_ic_heatmap(mean_monthly_ic)
plt.show()
factor_returns = alphalens.performance.factor_returns(factor_data)
alphalens.plotting.plot_cumulative_returns(factor_returns[10])
plt.show()
Alphalens数据准备
源数据需要两个DataFrame:
1.因子数据
2.股价数据(添加行业数据,用于行业中性化)
因子数据:
可以在factor_value后新增一列行业。
因子数据的前2列:date、asset是多重索引(MultiIndex),一级索引是date(日期),二级索引是asset(资产)
股价数据:
get_clean_factor_and_forward_returns()
alphalens.utils.get_clean_factor_and_forward_returns(factors,
prices,
groupby=None,
binning_by_group=False,
quantiles=5,
bins=None,
periods=(1, 5, 10),
filter_zscore=20,
groupby_labels=None,
max_loss=0.30,
zero_aware=False,
cumulative_returns=True)
参数详解
- factors:如图1的因子(行业)数据
- prices:如图2的股价数据
- groupby:股票分组
- binning_by_group: 是否分组进行计算
- quantiles: 将股票按数量等分
- bins: 按因子值等宽进行划分
- periods:持仓周期
- filter_zscore:异常阈值的倍数,过滤掉涨跌幅大的股票
- groupby_labels:分组标签
- max_loss:因子数据缺失率的上限,超过则报错
- zero_aware:是否正负信号分开算
- cumulative_returns:是否计算累计回报
因子分析函数
因子分析总报告
将清洗好的数据送入create_full_tear_sheet,即可获得所有的分析图
alphalens.tears.create_full_tear_sheet(data)
附上Alphalens文档