行业多因子模型

以申万一级行业为基准,选择不同行业适合的指标选出优质股票,建立股票池。数据来源为优矿季度数据(注:使用google浏览器)。

01-匹配行业大类
注:(1)drop_duplicates 删除重复行,keep="first"保留重复的第一行。
(2)将资产负债表匹配行业大类,以ticker字段重合时,先将股票代码转换成字符串类型之后进行合并,合并使用merge函数,how=“left”表示左连接,on为以哪个字段为基准进行合并。

import numpy as np
import pandas as pd
hangye = pd.read_csv("./行业分类_申万.csv",encoding = "gbk",low_memory=False)
hangye = hangye.drop_duplicates(subset=["ticker"],keep="first")
hangye["大类"]= ""
for i in range(hangye.shape[0]):
    if hangye.iloc[i,2] == "采掘" or hangye.iloc[i,2] == "有色金属":
        hangye.iloc[i,5] = "资源型行业"
#略去其他
dalei = hangye.loc[:,["ticker","大类"]]
dalei.to_csv("./行业大类.csv",encoding = "gbk",index = False)
df= pd.read_csv("./总资产负债表.csv",encoding = "gbk",low_memory=False)
df["ticker"]=df["ticker"].apply(str)
dalei["ticker"]=dalei["ticker"].apply(str)
df_new = pd.merge(df,dalei,how="left",on="ticker")
df_new.to_csv("./aa-data.csv",encoding = "gbk",index = False)

02-整理报表
以必须消费行业为例,合并资产负债表与利润表。
注:提取dataframe中满足特殊条件的的行使用loc:bixu = df.loc[df[“大类”] == “必须消费”]。
isin判断某值是否在列表中: lirun_bixu = lirun.loc[lirun[“证券简称”].isin(name)]

import numpy as np
import pandas as pd
df= pd.read_csv("./aa-data.csv",encoding = "gbk",low_memory=False)
lirun = pd.read_csv("./lirun_jidu.csv",encoding = "gbk",low_memory=False)
#提取资产负债表
bixu = df.loc[df["大类"] == "必须消费"]
#提取利润表
jiancheng = bixu["证券简称"]
name = list(set(jiancheng))
lirun_bixu = lirun.loc[lirun["证券简称"].isin(name)]
# 提取股票行情
hangqing = pd.read_csv("./股票行情.csv",encoding = "gbk",low_memory=False)
hangqing = hangqing.loc[:,["ticker","证券简称","截止日期","当季收盘价"]]
hangqing_bixu = hangqing.loc[hangqing["证券简称"].isin(name)]
hangqing_bixu.to_csv("./必须消费_股票行情.csv",encoding = "gbk",index = False)
# 去重
bixu  = bixu.drop_duplicates(subset=["截止日期","ticker","报告类型"], keep="first", inplace=False)
#合并表
# zong_bixu = pd.merge(lirun_bixu,bixu,how="left",on=["ticker","截止日期"])
# zong_bixu.to_csv("./必须消费.csv",encoding = "gbk",index = False)

03-根据指标提取股票
注:删除证券简称中含有ST的行,先各自表示成列表,再使用列表求差的方法去掉ST所在行。参考方法:https://blog.csdn.net/weixin_43849761/article/details/104808399

import numpy as np
import pandas as pd
import datetime
import re
df= pd.read_csv("./必须消费数据.csv",encoding = "gbk",low_memory=False)
y = df.loc[df['证券简称'].str.contains("ST")]
test1 = list(y["证券简称"])
test2 = list(df["证券简称"])
ret = list(set(test1) ^ set(test2))
df = df[df["证券简称"].isin(ret)]
hangqing= pd.read_csv("./必须消费_股票行情.csv",encoding = "gbk",low_memory=False)
# print(hangqing)
time = ["2016/3/31","2016/6/30","2016/9/30","2016/12/31","2017/3/31","2017/6/30","2017/9/30","2017/12/31","2018/3/31","2018/6/30","2018/9/30","2018/12/31","2019/3/31","2019/6/30","2019/9/30","2019/12/31"]
xuangu = pd.DataFrame()
num = []
for i in range(0,len(time)):
    df_201603 = df.loc[df["截止日期"] == time[i]]
    # print(df_201603)
    df_201603 = df_201603.sort_values(by="毛利稳健加速度", ascending=False)
    x = df_201603.shape[0]
    df_201603 = df_201603.head(int(x * 0.5))
    df_201603 = df_201603.sort_values(by="收入存货比变动", ascending=False)
    x = df_201603.shape[0]
    df_201603 = df_201603.head(int(x * 0.5))
    df_201603 = df_201603.sort_values(by="市净率", ascending=False)
    x = df_201603.shape[0]
    df_201603 = df_201603.head(int(x * 0.5))
    list1 = df_201603.iloc[:,2].tolist()
    if i == 0:
        num.append(0)
    else:
        hangqing_yixuan = hangqing.loc[hangqing["证券简称"].isin(list1)]
        hangqing_yixuan["截止日期"] = pd.to_datetime(hangqing_yixuan["截止日期"])
        hangqing_yixuan.sort_values(by=["ticker", "截止日期"], ascending=[True, False], inplace=True)
        list = []
        for j in range(0, len(hangqing_yixuan)-1):
            m = (hangqing_yixuan.iloc[j, :]["当季收盘价"] - hangqing_yixuan.iloc[j + 1, :]["当季收盘价"]) / hangqing_yixuan.iloc[j + 1, :]["当季收盘价"]
            list.append(m)
        list.extend(np.zeros(1))
        hangqing_yixuan["收益率"] =  list
        hangqing_yixuan = hangqing_yixuan[hangqing_yixuan["截止日期"].isin([time[i]])]
        num.append(hangqing_yixuan["收益率"].mean())
print(num)
num = pd.DataFrame(num)
num.to_csv("./05.csv", encoding="gbk", index=False)
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值