pandas之索引的变形stack和unstack

本文深入探讨了Pandas库中的unstack和stack函数,这两个函数用于数据透视和平展数据框。unstack将行索引转换为列索引,而stack则相反,将列索引转换为行索引。通过实例展示了它们如何工作,包括参数使用和处理唯一性的要求。此外,还展示了在数据不满足唯一性条件时可能出现的错误。
摘要由CSDN通过智能技术生成

unstack

unstack 函数的作用是把行索引转为列索引,例如下面这个简单的例子:

In [41]: df = pd.DataFrame(np.ones((4,2)),
   ....:                   index = pd.Index([('A', 'cat', 'big'),
   ....:                                     ('A', 'dog', 'small'),
   ....:                                     ('B', 'cat', 'big'),
   ....:                                     ('B', 'dog', 'small')]),
   ....:                   columns=['col_1', 'col_2'])
   ....: 

In [42]: df
Out[42]: 
             col_1  col_2
A cat big      1.0    1.0
  dog small    1.0    1.0
B cat big      1.0    1.0
  dog small    1.0    1.0

In [43]: df.unstack()
Out[43]: 
      col_1       col_2      
        big small   big small
A cat   1.0   NaN   1.0   NaN
  dog   NaN   1.0   NaN   1.0
B cat   1.0   NaN   1.0   NaN
  dog   NaN   1.0   NaN   1.0

unstack 的主要参数是移动的层号,默认转化最内层,移动到列索引的最内层,同时支持同时转化多个层:

In [44]: df.unstack(2)
Out[44]: 
      col_1       col_2      
        big small   big small
A cat   1.0   NaN   1.0   NaN
  dog   NaN   1.0   NaN   1.0
B cat   1.0   NaN   1.0   NaN
  dog   NaN   1.0   NaN   1.0

In [45]: df.unstack([0,2])
Out[45]: 
    col_1                  col_2                 
        A          B           A          B      
      big small  big small   big small  big small
cat   1.0   NaN  1.0   NaN   1.0   NaN  1.0   NaN
dog   NaN   1.0  NaN   1.0   NaN   1.0  NaN   1.0

类似于 pivot中的唯一性要求,在 unstack 中必须保证 被转为列索引的行索引层被保留的行索引层 构成的组合是唯一的,例如把前两个列索引改成相同的破坏唯一性,那么就会报错:

In [46]: my_index = df.index.to_list()

In [47]: my_index[1] = my_index[0]

In [48]: df.index = pd.Index(my_index)

In [49]: df
Out[49]: 
             col_1  col_2
A cat big      1.0    1.0
      big      1.0    1.0
B cat big      1.0    1.0
  dog small    1.0    1.0

In [50]: try:
   ....:    df.unstack()
   ....: except Exception as e:
   ....:    Err_Msg = e
   ....: 

In [51]: Err_Msg
Out[51]: ValueError('Index contains duplicate entries, cannot reshape')

stack

unstack 相反,stack 的作用就是把列索引的层压入行索引 ,其用法完全类似。

In [52]: df = pd.DataFrame(np.ones((4,2)),
   ....:                   index = pd.Index([('A', 'cat', 'big'),
   ....:                                     ('A', 'dog', 'small'),
   ....:                                     ('B', 'cat', 'big'),
   ....:                                     ('B', 'dog', 'small')]),
   ....:                   columns=['index_1', 'index_2']).T
   ....: 

In [53]: df
Out[53]: 
           A          B      
         cat   dog  cat   dog
         big small  big small
index_1  1.0   1.0  1.0   1.0
index_2  1.0   1.0  1.0   1.0

In [54]: df.stack()
Out[54]: 
                 A         B     
               cat  dog  cat  dog
index_1 big    1.0  NaN  1.0  NaN
        small  NaN  1.0  NaN  1.0
index_2 big    1.0  NaN  1.0  NaN
        small  NaN  1.0  NaN  1.0

In [55]: df.stack([1, 2])
Out[55]: 
                     A    B
index_1 cat big    1.0  1.0
        dog small  1.0  1.0
index_2 cat big    1.0  1.0
        dog small  1.0  1.0

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值