问题描述
作为一名电商行业数据分析师,最近在工作中遇到了一个需求,需要把客户提供的月销售目标除以当月天数,分解成日销售目标,之后再与日期表连接,在BI工具中实现销售额的完成率可视化。
原数据格式:
2022年每个月的销售额目标-json格式
目标数据格式:
一月销售额目标3000,除以一月31天,分解成每日销售目标96.8
数据处理逻辑:
将json格式的数据转化成DataFrame
新建2022年日期表
join两张表
Python代码
import pandas as pd
df = pd.read_csv('df.csv')
print(df)
df = df['kpis'].str.split(',',expand=True) #按逗号拆分
df = df.T #转置
df = df[0].str.split(':', expand=True) #再按冒号拆分一次
df
# 遍历做一遍数据处理,去掉括号和双引号
month = []
for i in df[0].values:
i=i.replace('{','')
i=i.replace('"','')
i=i.strip()
month.append(i)
print(month)
kpi = []
for i in df[1].values:
i = i.replace('"','')
i = i.replace('}', '')
kpi.append(i)
print(kpi)
df[0] = month
df[1] = kpi
df['year']=2022
df = df.rename(columns={0:'month',1:'kpi'})
df
# 新加一列月份的数字格式
month_dic = {'january':1, 'february':2,'march':3,'april':4,'may':5,'june':6,'july':7,'august':8,'september':9,'october':10,'november':11,'december':12}
df['month_num'] = df['month'].values
df['month_num'] = df['month_num'].map(lambda x: month_dic[x])
df
df = df[['year','month_num','kpi']]
df[['kpi']] = df[['kpi']].astype('int')
df['key'] = df[['year','month_num']].astype('str').apply("-".join,axis=1) #作为后面与日期表连接的key
df
# 新建2022年日期表
time = pd.DataFrame()
time['date'] = pd.date_range('2022-01-01','2022-12-31') #date列,范围从2022-1-1到2022-12-31
time['year'] = time.date.apply(lambda x: x.strftime('%Y')) #年
time['month'] = time.date.apply(lambda x: x.strftime('%m')) #月
time['day'] = time.date.apply(lambda x: x.strftime('%d')) #日
time[['year','month','day']] = time[['year','month','day']].astype('int')
time['key'] = time[['year','month']].astype('str').apply("-".join,axis=1)
month_day = {1:31,2:28,3:31,4:30,5:31,6:30,7:31,8:31,9:30,10:31,11:30,12:31} #每个月的天数
time['month_days'] = time['month'].map(lambda x: month_day[x])
time
df_merge = pd.merge(time,df,how='left', on='key')
df_merge = df_merge[['date', 'year_x','month','day','key','month_days','kpi']]
df_merge['kpi'] = round(df_merge['kpi']/df_merge['month_days'],1)
df_merge
daily_kpi = df_merge[['date','kpi']]
daily_kpi
至此,任务完成,每个月的销售目标都除以了当月的实际天数。