pandas功能使用rename, reindex, set_index 详解

pandas功能使用rename, reindex, set_index 详解

pandas rename 功能

  • 在使用 pandas 的过程中经常会用到修改列名称的问题,会用到 rename 或者 reindex 等功能,每次都需要去查文档
  • 当然经常也可以使用 df.columns重新赋值为某个列表
  • 用 rename 则可以轻松应对 pandas 中修改列名的问题
导入常用的数据包
import pandas as pd
import numpy as np
构建一个 含有multiIndex的 Series
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
s = pd.Series(np.random.randn(8), index=index)
s.index
MultiIndex(levels=[['bar', 'baz', 'foo', 'qux'], ['one', 'two']],
           labels=[[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1]],
           names=['first', 'second'])
查看 s
s
first  second
bar    one      -0.073094
       two      -0.449141
baz    one       0.109093
       two      -0.033135
foo    one       1.315809
       two      -0.887890
qux    one       2.255328
       two      -0.778246
dtype: float64
使用set_names可以将 index 中的名称进行更改
s.index.set_names(['L1', 'L2'], inplace=True)
s
L1   L2 
bar  one    0.037524
     two   -0.178425
baz  one   -0.778211
     two    1.440168
foo  one    0.314172
     two    0.710597
qux  one    1.197275
     two    0.527058
dtype: float64
s.index
MultiIndex(levels=[['bar', 'baz', 'foo', 'qux'], ['one', 'two']],
           labels=[[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1]],
           names=['L1', 'L2'])
同样可以使用 rename 将Series 修改回来
s.index.rename(['first','second'],inplace= True)
s
first  second
bar    one       0.037524
       two      -0.178425
baz    one      -0.778211
       two       1.440168
foo    one       0.314172
       two       0.710597
qux    one       1.197275
       two       0.527058
dtype: float64
使用reset_index 可以将 index 中的两列转化为正常的列
s.reset_index()
firstsecond0
0barone0.037524
1bartwo-0.178425
2bazone-0.778211
3baztwo1.440168
4fooone0.314172
5footwo0.710597
6quxone1.197275
7quxtwo0.527058
可以使用 pivot_table 恢复成一开始的样子,将两列重新作为 index 展示出来
s.reset_index().pivot_table(index=['first','second'],values=0,aggfunc=lambda x:x)
0
firstsecond
barone0.037524
two-0.178425
bazone-0.778211
two1.440168
fooone0.314172
two0.710597
quxone1.197275
two0.527058
同样可以使用最简单的方式进行更改 index 中的名称
s.index.names=['first1','second1'] ## 此操作,相当于直接赋值,会更改 s
s.index
MultiIndex(levels=[['bar', 'baz', 'foo', 'qux'], ['one', 'two']],
           labels=[[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1]],
           names=['first1', 'second1'])
s
first1  second1
bar     one        0.037524
        two       -0.178425
baz     one       -0.778211
        two        1.440168
foo     one        0.314172
        two        0.710597
qux     one        1.197275
        two        0.527058
dtype: float64
df = pd.DataFrame({'A' : ['one', 'one', 'two', 'three'] * 3,                    'B' : ['A', 'B', 'C'] * 4,
                 'C' : ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 2,
                  'D' : np.random.randn(12),
                 'E' : np.random.randn(12)})
df.head()
ABCDE
0oneAfoo0.664180-0.107764
1oneBfoo-0.8336090.008083
2twoCfoo0.117919-1.365583
3threeAbar-0.116776-1.201934
4oneBbar-1.315190-0.157779
df.pivot_table(index=['A','C'],values=['D'],columns='B',aggfunc=np.sum,fill_value='unknown')
D
BABC
AC
onebar2.71452-1.315190.0231296
foo0.66418-0.833609-0.96451
threebar-0.116776unknown0.450891
foounknown0.012846unknown
twobarunknown0.752643unknown
foo0.963631unknown0.117919
df1 =df.pivot_table(index=['A','C'],values=['D'],columns='B',aggfunc=np.sum,fill_value='unknown')
df1.index
MultiIndex(levels=[['one', 'three', 'two'], ['bar', 'foo']],
           labels=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]],
           names=['A', 'C'])
df1.index.names=['first','second']
df1
D
BABC
firstsecond
onebar2.71452-1.315190.0231296
foo0.66418-0.833609-0.96451
threebar-0.116776unknown0.450891
foounknown0.012846unknown
twobarunknown0.752643unknown
foo0.963631unknown0.117919
df1_stack=df1.stack()
df1_stack.index.names=['first','second','third']
df1_stack
D
firstsecondthird
onebarA2.71452
B-1.31519
C0.0231296
fooA0.66418
B-0.833609
C-0.96451
threebarA-0.116776
Bunknown
C0.450891
fooAunknown
B0.012846
Cunknown
twobarAunknown
B0.752643
Cunknown
fooA0.963631
Bunknown
C0.117919
df1_stack.columns=['总和']
df1_stack
总和
firstsecondthird
onebarA2.71452
B-1.31519
C0.0231296
fooA0.66418
B-0.833609
C-0.96451
threebarA-0.116776
Bunknown
C0.450891
fooAunknown
B0.012846
Cunknown
twobarAunknown
B0.752643
Cunknown
fooA0.963631
Bunknown
C0.117919
df2 = df1_stack.reset_index()
df2.set_index('first')
secondthird总和
first
onebarA2.71452
onebarB-1.31519
onebarC0.0231296
onefooA0.66418
onefooB-0.833609
onefooC-0.96451
threebarA-0.116776
threebarBunknown
threebarC0.450891
threefooAunknown
threefooB0.012846
threefooCunknown
twobarAunknown
twobarB0.752643
twobarCunknown
twofooA0.963631
twofooBunknown
twofooC0.117919

posted on 2019-02-23 22:51 多一点 阅读(...) 评论(...) 编辑 收藏

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值