重塑和轴向旋转 用于重新排列表格型数据的基础运算。
对于DataFrame,主要功能有:
(1)stack:将数据的列“旋转”为行 (2)unstack:将数据的行“旋转”为列
例1:(其中行列索引均为字符串)
data = DataFrame(np.arange(6).reshape((2,3)),index=pd.Index(['O','C'],name='state'),columns=pd.Index(['one','two','three'],name='number'))
data
Out[3]:
number one two three
state
O 0 1 2
C 3 4 5
result=data.stack() #使用该数据的stack方法即可将列转换为行,得到一个Series
result
Out[5]:
state number
O one 0
two 1
three 2
C one 3
two 4
three 5
dtype: int32
result.unstack() #对于一个层次化索引的Series,你可以用unstack将其重排为一个DataFrame
Out[6]:
number one two three
state
O 0 1 2
C 3 4 5
result.unstack(0) #默认情况下,操作的是最内层(stack也是如此)。传入分层级的编号或名称即可对其他级别进行unstack操作
Out[7]:
state O C
number
one 0 3
two 1 4
three 2 5
result.unstack('state')
Out[8]:
state O C
number
one 0 3
two 1 4
three 2 5
(3)如果不是所有的级别值都能在分组中找到的话,则unstack操作可能会引入缺失数据
s1 = Series([0,1,2,3],index=['a','b','c','d'])
s2 = Series([4,5,6],index=['c','d','e'])
data2 = pd.concat([s1,s2],keys=['one','two'])
data2.unstack()
Out[9]:
a b c d e
one 0.0 1.0 2.0 3.0 NaN
two NaN NaN 4.0 5.0 6.0
data2.unstack().stack() #stack默认会滤除缺失数据,因此该运算是可逆的
Out[10]:
one a 0.0
b 1.0
c 2.0
d 3.0
two c 4.0
d 5.0
e 6.0
dtype: float64
data2.unstack().stack(dropna=False)
Out[11]:
one a 0.0
b 1.0
c 2.0
d 3.0
e NaN
two a NaN
b NaN
c 4.0
d 5.0
e 6.0
dtype: float64
(4)在对DataFrame进行unstack操作时,作为旋转轴的级别将会成为结果中的最低级别:
df = DataFrame({'left':result,'right':result+5},columns=pd.Index(['left','right'],name='side'))
df
Out[13]:
side left right
state number
O one 0 5
two 1 6
three 2 7
C one 3 8
two 4 9
three 5 10
df = DataFrame({'left':result,'right':result+5},columns=pd.Index(['left','right'],name='side'))
df
Out[13]:
side left right
state number
O one 0 5
two 1 6
three 2 7
C one 3 8
two 4 9
three 5 10
df.unstack('state')
Out[14]:
side left right
state O C O C
number
one 0 3 5 8
two 1 4 6 9
three 2 5 7 10
df.unstack('state').stack('side')
Out[15]:
state C O
number side
one left 3 0
right 8 5
two left 4 1
right 9 6
three left 5 2
right 10 7