unstack
unstack
函数的作用是把行索引转为列索引
,例如下面这个简单的例子:
In [41]: df = pd.DataFrame(np.ones((4,2)),
....: index = pd.Index([('A', 'cat', 'big'),
....: ('A', 'dog', 'small'),
....: ('B', 'cat', 'big'),
....: ('B', 'dog', 'small')]),
....: columns=['col_1', 'col_2'])
....:
In [42]: df
Out[42]:
col_1 col_2
A cat big 1.0 1.0
dog small 1.0 1.0
B cat big 1.0 1.0
dog small 1.0 1.0
In [43]: df.unstack()
Out[43]:
col_1 col_2
big small big small
A cat 1.0 NaN 1.0 NaN
dog NaN 1.0 NaN 1.0
B cat 1.0 NaN 1.0 NaN
dog NaN 1.0 NaN 1.0
unstack
的主要参数是移动的层号,默认转化最内层
,移动到列索引的最内层,同时支持同时转化多个层:
In [44]: df.unstack(2)
Out[44]:
col_1 col_2
big small big small
A cat 1.0 NaN 1.0 NaN
dog NaN 1.0 NaN 1.0
B cat 1.0 NaN 1.0 NaN
dog NaN 1.0 NaN 1.0
In [45]: df.unstack([0,2])
Out[45]:
col_1 col_2
A B A B
big small big small big small big small
cat 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN
dog NaN 1.0 NaN 1.0 NaN 1.0 NaN 1.0
类似于 pivot
中的唯一性要求,在 unstack
中必须保证 被转为列索引的行索引层
和 被保留的行索引层
构成的组合是唯一的,例如把前两个列索引改成相同的破坏唯一性,那么就会报错:
In [46]: my_index = df.index.to_list()
In [47]: my_index[1] = my_index[0]
In [48]: df.index = pd.Index(my_index)
In [49]: df
Out[49]:
col_1 col_2
A cat big 1.0 1.0
big 1.0 1.0
B cat big 1.0 1.0
dog small 1.0 1.0
In [50]: try:
....: df.unstack()
....: except Exception as e:
....: Err_Msg = e
....:
In [51]: Err_Msg
Out[51]: ValueError('Index contains duplicate entries, cannot reshape')
stack
与 unstack
相反,stack
的作用就是把列索引的层压入行索引
,其用法完全类似。
In [52]: df = pd.DataFrame(np.ones((4,2)),
....: index = pd.Index([('A', 'cat', 'big'),
....: ('A', 'dog', 'small'),
....: ('B', 'cat', 'big'),
....: ('B', 'dog', 'small')]),
....: columns=['index_1', 'index_2']).T
....:
In [53]: df
Out[53]:
A B
cat dog cat dog
big small big small
index_1 1.0 1.0 1.0 1.0
index_2 1.0 1.0 1.0 1.0
In [54]: df.stack()
Out[54]:
A B
cat dog cat dog
index_1 big 1.0 NaN 1.0 NaN
small NaN 1.0 NaN 1.0
index_2 big 1.0 NaN 1.0 NaN
small NaN 1.0 NaN 1.0
In [55]: df.stack([1, 2])
Out[55]:
A B
index_1 cat big 1.0 1.0
dog small 1.0 1.0
index_2 cat big 1.0 1.0
dog small 1.0 1.0