如题所述,我们的目标是将含有三列的dataframe中的两列分别处理成行索引和列索引,第三列则是(行索引,列索引)对应位置的值。
dataframe设为df1,其中三列分别为date,id,cnt
date | id | cnt |
2020-03-03 00:00:00 | 000001 | 7 |
2020-03-03 01:00:00 | 000002 | 4 |
2020-03-03 02:00:00 | 000003 | 9 |
2020-03-03 03:00:00 | 000004 | 13 |
2020-03-03 00:00:00 | 000002 | 8 |
2020-03-03 01:00:00 | 000003 | 5 |
2020-03-03 02:00:00 | 000004 | 10 |
2020-03-03 03:00:00 | 000001 | 14 |
2020-03-03 00:00:00 | 000003 | 9 |
2020-03-03 01:00:00 | 000004 | 6 |
2020-03-03 02:00:00 | 000001 | 11 |
2020-03-03 03:00:00 | 000002 | 15 |
2020-03-03 00:00:00 | 000004 | 10 |
2020-03-03 01:00:00 | 000001 | 7 |
2020-03-03 02:00:00 | 000002 | 12 |
2020-03-03 03:00:00 | 000003 | 16 |
目标是变成:
000001 | 000002 | 00003 | 00004 | |
2020-03-03 00:00:00 | 7 | 8 | 9 | 10 |
2020-03-03 01:00:00 | 7 | 4 | 5 | 6 |
2020-03-03 02:00:00 | 11 | 12 | 9 | 10 |
2020-03-03 03:00:00 | 14 | 15 | 16 | 13 |
col=['date']
col.extend(df1.id.unique().tolist())
result=pd.DataFrame(columns=col)
result['date']=df1.date.unique().tolist()
for i in df1.id.unique().tolist():
tmp = df1.loc[df1['id']==i]
_list=[]
for j in df1.date.unique().tolist():
a=tmp.loc[tmp['date']==j]['cnt'].to_list()
if len(a)==0:
a.append('NA')
_list.extend(a)
result[i]=_list