概述
众所周知, 行业和市值是两个十分显著对因子有影响力的因素. 在进行截面回归判断每个单因子的收益情况和显著性时,需要特别关注这两个十分显著的因素. 市值中性化是为了在因子选股回测的时候, 防止选到的股票集中在固定的某些股票当中.
市值影响
大部分因子当中都包含了市值的影响. 所以当我们通过一些指标选择股票的时候, 每个因子都会提供市值的因素. 选择的股票就会比较集中, 及选股的标准不太好.
举个栗子, 市净率与市值有很高的相关性. 这时我们使用未进行市值中性化的市净率, 选股的结果会比较集中.
怎么去除市值影响
回归法
我们只会用到线性回归中最基础的内容. 不熟悉线性回归的同学可以看看我写的这篇博客:
简介
简单的来说我们将要去除市值影响的因子定义为 y, 市值因子定义为 x.
预测得到的偏差 (残差值) 即为市值对该因子的影响.
流程分析
我们拿市净率做为栗子, 来简单阐述一下市净率与市值之间的联系. 流程:
- 获取两个因子数据
- 对目标值因子市净率进行去极值
- 建立市值与市净率回归方程
- 通过回归系数, 预测新的因子结果 y_predict
- 求出市净率与 y_predict 的偏差即为新的因子值
代码实现
import numpy as np
from sklearn.linear_model import LinearRegression
def mad(factor):
"""3倍中位数去极值"""
# 求出因子值的中位数
median = np.median(factor)
# 求出因子值与中位数的差值, 进行绝对值
mad = np.median(abs(factor - median))
# 定义几倍的中位数上下限
high = median + (3 * 1.4826 * mad)
low = median - (3 * 1.4826 * mad)
# 替换上下限
factor = np.where(factor > high, high, factor)
factor = np.where(factor < low, low, factor)
return factor
def stand(factor):
"""数据标准化"""
mean = factor.mean()
std = factor.std()
return (factor - mean) / std
# 获取市净率和市值的数据
q = query(
fundamentals.eod_derivative_indicator.pb_ratio, # 市净率
fundamentals.eod_derivative_indicator.market_cap # 市值
)
# 获取截面数据
fund = get_fundamentals(q, entry_date="2021-01-03").iloc[:,0,:]
print(fund["pb_ratio"]) # 调试输出
print("--------------------------------------")
# 对pb_ration 进行去极值标准化处理
fund["pb_ratio"] = mad(fund["pb_ratio"])
fund["pb_ratio"] = stand(fund["pb_ratio"])
# 回归数据提取 x: 市值, y: 因子数据
x = fund["market_cap"].values.reshape(-1,1)
y = fund["pb_ratio"]
# 建立回归方程
lr = LinearRegression()
lr.fit(x, y) # 拟合
y_predict = lr.predict(x) # 预测
# 去除残差
fund["pb_ratio"] = y - y_predict
print(fund["pb_ratio"]) # 调试输出
输出结果:
688526.XSHG 7.4964
688286.XSHG 6.3164
688356.XSHG 7.9386
688004.XSHG 3.8863
688558.XSHG 2.9505
688418.XSHG 4.1119
688589.XSHG 5.1825
688569.XSHG 2.0132
688586.XSHG 6.5884
688077.XSHG 3.4511
688229.XSHG 5.4251
688065.XSHG 3.4576
688050.XSHG 12.2044
688106.XSHG 5.4171
688379.XSHG 2.4582
688580.XSHG 5.152
688060.XSHG 5.6061
688508.XSHG 8.5341
688600.XSHG 2.9436
688311.XSHG 8.758
688390.XSHG 15.0377
688155.XSHG 4.9158
688577.XSHG 3.1298
688556.XSHG 4.6791
688336.XSHG 3.2521
688585.XSHG 5.0047
688156.XSHG 2.7639
688157.XSHG 6.1503
688518.XSHG 3.2895
688208.XSHG 12.8005
...
605168.XSHG 8.7524
605166.XSHG 2.8279
605177.XSHG 4.2882
605169.XSHG 5.862
605178.XSHG 1.7349
605179.XSHG 9.3183
605183.XSHG 4.0715
605186.XSHG 11.2112
605198.XSHG 3.9775
605188.XSHG 4.542
605199.XSHG 11.092
605218.XSHG 2.7977
605222.XSHG 3.1673
605255.XSHG 2.3709
605258.XSHG 5.8935
605266.XSHG 7.7229
605288.XSHG 3.2855
605299.XSHG 5.6859
605318.XSHG 2.3269
605336.XSHG 2.4506
605333.XSHG 7.976
605338.XSHG 5.9966
605358.XSHG 26.9882
605366.XSHG 2.443
605369.XSHG 3.8247
605377.XSHG 3.2595
605376.XSHG 18.0658
605388.XSHG 5.0266
605399.XSHG 3.0122
605500.XSHG 3.6253
Name: pb_ratio, Length: 4140, dtype: object
--------------------------------------
688526.XSHG 1.70633
688286.XSHG 1.2302
688356.XSHG 1.90345
688004.XSHG 0.225928
688558.XSHG -0.163364
688418.XSHG 0.316995
688589.XSHG 0.762834
688569.XSHG -0.553206
688586.XSHG 1.33677
688077.XSHG 0.0448572
688229.XSHG 0.862351
688065.XSHG 0.0130358
688050.XSHG 2.05539
688106.XSHG 0.848339
688379.XSHG -0.366186
688580.XSHG 0.74662
688060.XSHG 0.937184
688508.XSHG 2.06342
688600.XSHG -0.164907
688311.XSHG 2.06013
688390.XSHG 2.05246
688155.XSHG 0.650357
688577.XSHG -0.0876943
688556.XSHG 0.552952
688336.XSHG -0.0506448
688585.XSHG 0.687238
688156.XSHG -0.239058
688157.XSHG 1.16128
688518.XSHG -0.0237863
688208.XSHG 2.0419
...
605168.XSHG 2.06131
605166.XSHG -0.213845
605177.XSHG 0.391245
605169.XSHG 1.04437
605178.XSHG -0.66783
605179.XSHG 2.06724
605183.XSHG 0.298125
605186.XSHG 2.07071
605198.XSHG 0.257471
605188.XSHG 0.495605
605199.XSHG 2.0645
605218.XSHG -0.226267
605222.XSHG -0.0788161
605255.XSHG -0.402446
605258.XSHG 1.05833
605266.XSHG 1.81468
605288.XSHG -0.027241
605299.XSHG 0.970047
605318.XSHG -0.420426
605336.XSHG -0.371327
605333.XSHG 1.91961
605338.XSHG 1.09455
605358.XSHG 2.02334
605366.XSHG -0.374542
605369.XSHG 0.198264
605377.XSHG -0.035712
605376.XSHG 2.06168
605388.XSHG 0.691721
605399.XSHG -0.13748
605500.XSHG 0.116246
Name: pb_ratio, Length: 4140, dtype: object