问题描述:
之前使用接口从网上下载申万一级的行业数据的时候,发现居然缺失了好几天。时间从2015年1月1日至2020年12月31日,一共缺失7天
趁此机会洗一下数据吧,中间值只能用插值来弥补
total_dic = {}
for stock_name in data_dic.keys():
df = data_dic[stock_name].sort_values(by='date')
tmp_date_se = pd.to_datetime(df['date'])
tmp_date_se.name = 'datetime'
df.index = tmp_date_se
merge_df = pd.merge(left=standard_df,right=df,on='datetime',how='outer')
merge_df.loc[:,'date'] = trading_date_se
merge_df['open'] = merge_df['open'].astype(float)
merge_df['high'] = merge_df['high'].astype(float)
merge_df['low'] = merge_df['low'].astype(float)
merge_df['close'] = merge_df['close'].astype(float)
merge_df['vol'] = merge_df['vol'].astype(float)
merge_df['amount'] = merge_df['amount'].astype(float)
merge_df['change_pct'] = merge_df['change_pct'].astype(float)
merge_df = merge_df.apply(pd.Series.interpolate)
merge_df['index_name'] = merge_df['index_name'].iloc[0]
merge_df['index_code'] = merge_df['index_code'].iloc[0]
merge_df = merge_df.drop(axis=1,labels=['A'])
merge_df.columns = ['IndexStockName', 'IndexChineseName', 'Date', 'Open', 'High', 'Low', 'Close',
'Volume', 'Amount', 'ChangePct']
total_dic.update({stock_name:merge_df})
讲道理申万自己官网上都缺失数据,这个确实是大吃一惊的,果然免费的数据处处是坑啊。