Python Pandas 第2章 索引

>>> import numpy as np
>>> import pandas as pd
>>> df = pd.read_csv('data/table.csv',index_col='ID')
>>> df.head()
     School Class Gender   Address  Height  Weight  Math Physics
ID                                                              
1101    S_1   C_1      M  street_1     173      63  34.0      A+
1102    S_1   C_1      F  street_2     192      73  32.5      B+
1103    S_1   C_1      M  street_2     186      82  87.2      B+
1104    S_1   C_1      F  street_2     167      81  80.4      B-
1105    S_1   C_1      F  street_4     159      64  84.8      B+

一、单级索引

1. loc方法、iloc方法、[]操作符

最常用的索引方法可能就是这三类,其中iloc表示位置索引,loc表示标签索引,[]也具有很大的便利性,各有特点

(a)loc方法

① 单行索引:

>>> df.loc[1103]
School          S_1
Class           C_1
Gender            M
Address    street_2
Height          186
Weight           82
Math           87.2
Physics          B+
Name: 1103, dtype: object

② 多行索引:

>>> df.loc[[1102,2304]]
     School Class Gender   Address  Height  Weight  Math Physics
ID                                                              
1102    S_1   C_1      F  street_2     192      73  32.5      B+
2304    S_2   C_3      F  street_6     164      81  95.5      A-

(注意:所有在loc中使用的切片全部包含右端点!这是因为如果作为Pandas的使用者,那么肯定不太关心最后一个标签再往后一位是什么,但是如果是左闭右开,那么就很麻烦,先要知道再后面一列的名字是什么,非常不方便,因此Pandas中将loc设计为左右全闭)

>>> df.loc[1304:2103].head()
     School Class Gender   Address  Height  Weight  Math Physics
ID                                                              
1304    S_1   C_3      M  street_2     195      70  85.2       A
1305    S_1   C_3      F  street_5     187      69  61.7      B-
2101    S_2   C_1      M  street_7     174      84  83.3       C
2102    S_2   C_1      F  street_6     161      61  50.6      B+
2103    S_2   C_1      M  street_4     157      61  52.5      B-

>>> df.loc[2402::-1].head()
     School Class Gender   Address  Height  Weight  Math Physics
ID                                                              
2402    S_2   C_4      M  street_7     166      82  48.7       B
2401    S_2   C_4      F  street_2     192      62  45.3       A
2305    S_2   C_3      M  street_4     187      73  48.9       B
2304    S_2   C_3      F  street_6     164      81  95.5      A-
2303    S_2   C_3      F  street_7     190      99  65.9       C

③ 单列索引:

>>> df.loc[:,'Height'].head()
ID
1101    173
1102    192
1103    186
1104    167
1105    159
Name: Height, dtype: int64

④ 多列索引:

>>> df.loc[:,['Height','Math']].head()
      Height  Math
ID                
1101     173  34.0
1102     192  32.5
1103     186  87.2
1104     167  80.4
1105     159  84.8

>>> df.loc[:,'Height':'Math'].head()
      Height  Weight  Math
ID                        
1101     173      63  34.0
1102     192      73  32.5
1103     186      82  87.2
1104     167      81  80.4
1105     159      64  84.8

⑤ 联合索引:

>>> df.loc[1102:2401:3,'Height':'Math'].head()
      Height  Weight  Math
ID                        
1102     192      73  32.5
1105     159      64  84.8
1203     160      53  58.8
1301     161      68  31.5
1304     195      70  85.2

⑥ 函数式索引:

>>> df.loc[lambda x:x['Gender']=='M'].head()
>>> #loc中使用的函数,传入参数就是前面的df
     School Class Gender   Address  Height  Weight  Math Physics
ID                                                              
1101    S_1   C_1      M  street_1     173      63  34.0      A+
1103    S_1   C_1      M  street_2     186      82  87.2      B+
1201    S_1   C_2      M  street_5     188      68  97.0      A-
1203    S_1   C_2      M  street_6     160      53  58.8      A+
1301    S_1   C_3      M  street_4     161      68  31.5      B+

#这里的例子表示,loc中能够传入函数,并且函数的输入值是整张表,输出为标量、切片、合法列表(元素出现在索引中)、合法索引
>>> def f(x):
    return [1101,1103]
>>> df.loc[f]
     School Class Gender   Address  Height  Weight  Math Physics
ID                                                              
1101    S_1   C_1      M  street_1     173      63  34.0      A+
1103    S_1   C_1      M  street_2     186      82  87.2      B+

⑦ 布尔索引(将重点在第2节介绍)

>>> df.loc[df['Address'].isin(['street_7','street_4'])].head()
     School Class Gender   Address  Height  Weight  Math Physics
ID                                                              
1105    S_1   C_1      F  street_4     159      64  84.8      B+
1202    S_1   C_2      F  street_4     176      94  63.5      B-
1301    S_1   C_3      M  street_4     161      68  31.5      B+
1303    S_1   C_3      M  street_7     188      82  49.7       B
2101    S_2   C_1      M  street_7     174      84  83.3       C

>>> df.loc[[True if i[-1]=='4' or i[-1]=='7' else False for i in df['Address'].values]].head()
     School Class Gender   Address  Height  Weight  Math Physics
ID                                                              
1105    S_1   C_1      F  street_4     159      64  84.8      B+
1202    S_1   C_2      F  street_4     176      94  63.5      B-
1301    S_1   C_3      M  street_4     161      68  31.5      B+
1303    S_1   C_3      M  street_7     188      82  49.7       B
2101    S_2   C_1      M  street_7     174      84  83.3       C

小节:本质上说,loc中能传入的只有布尔列表和索引子集构成的列表,只要把握这个原则就很容易理解上面那些操作

(b)iloc方法(注意与loc不同,切片右端点不包含)

① 单行索引:

>>> df.iloc[3]
School          S_1
Class           C_1
Gender            F
Address    street_2
Height          167
Weight           81
Math           80.4
Physics          B-
Name: 1104, dtype: object

② 多行索引:

>>> df.iloc[3:5]
     School Class Gender   Address  Height  Weight  Math Physics
ID                                                              
1104    S_1   C_1      F  street_2     167      81  80.4      B-
1105    S_1   C_1      F  street_4     159      64  84.8      B+

③ 单列索引:

>>> df.iloc[:,3].head()
ID
1101    street_1
1102    street_2
1103    street_2
1104    street_2
1105    street_4
Name: Address, dtype: object

④ 多列索引:

>>> df.iloc[:,7::-2].head()
     Physics  Weight   Address Class
ID                                  
1101      A+      63  street_1   C_1
1102      B+      73  street_2   C_1
1103      B+      82  street_2   C_1
1104      B-      81  street_2   C_1
1105      B+      64  street_4   C_1

⑤ 混合索引:

>>> df.iloc[3::4,7::-2].head()
     Physics  Weight   Address Class
ID                                  
1104      B-      81  street_2   C_1
1203      A+      53  street_6   C_2
1302      A-      57  street_1   C_3
2101       C      84  street_7   C_1
2105       A      81  street_4   C_1

⑥ 函数式索引:

>>> df.iloc[lambda x:[3]].head()
     School Class Gender   Address  Height  Weight  Math Physics
ID                                                              
1104    S_1   C_1      F  street_2     167      81  80.4      B-

小节:iloc中接收的参数只能为整数或整数列表或布尔列表,不能使用布尔Series,如果要用就必须如下把values拿出来

>>> #df.iloc[df['School']=='S_1'].head() #报错
>>> df.iloc[(df['School']=='S_1').values].head()
     School Class Gender   Address  Height  Weight  Math Physics
ID                                                              
1101    S_1   C_1      M  street_1     173      63  34.0      A+
1102    S_1   C_1      F  street_2     192      73  32.5      B+
1103    S_1   C_1      M  street_2     186      82  87.2      B+
1104    S_1   C_1      F  street_2     167      81  80.4      B-
1105    S_1   C_1      F  street_4     159      64  84.8      B+

(c) []操作符

(c.1)Series的[]操作

① 单元素索引:

>>> s = pd.Series(df['Math'],index=df.index)
>>> s[1101]
>>> #使用的是索引标签
34.0

② 多行索引:

>>> s[0:4]
>>> #使用的是绝对位置的整数切片,与元素无关,这里容易混淆
ID
1101    34.0
1102    32.5
1103    87.2
1104    80.4
Name: Math, dtype: float64

③ 函数式索引:

>>> s[lambda x: x.index[16::-6]]
>>> #注意使用lambda函数时,直接切片(如:s[lambda x: 16::-6])就报错,此时使用的不是绝对位置切片,而是元素切片,非常易错
ID
2102    50.6
1301    31.5
1105    84.8
Name: Math, dtype: float64

④ 布尔索引:

>>> s[s>80]
ID
1103    87.2
1104    80.4
1105    84.8
1201    97.0
1302    87.7
1304    85.2
2101    83.3
2205    85.4
2304    95.5
Name: Math, dtype: float64

【注意】如果不想陷入困境,请不要在行索引为浮点时使用[]操作符,因为在Series[]的浮点切片并不是进行位置比较,而是值比较,非常特殊

>>> s_int = pd.Series([1,2,3,4],index=[1,3,5,6])
>>> s_float = pd.Series([1,2,3,4],index=[1.,3.,5.,6.])
>>> s_int
1    1
3    2
5    3
6    4
dtype: int64

>>> s_int[2:]
5    3
6    4
dtype: int64

>>> s_float
1.0    1
3.0    2
5.0    3
6.0    4
dtype: int64

>>> #注意和s_int[2:]结果不一样了,因为2这里是元素而不是位置
>>> s_float[2:]
3.0    2
5.0    3
6.0    4
dtype: int64
(c.2)DataFrame的[]操作

① 单行索引:

>>> df[1:2]
>>> #这里非常容易写成df['label'],会报错
>>> #同Series使用了绝对位置切片
>>> #如果想要获得某一个元素,可用如下get_loc方法:
     School Class Gender   Address  Height  Weight  Math Physics
ID                                                              
1102    S_1   C_1      F  street_2     192      73  32.5      B+

>>> row = df.index.get_loc(1102)
>>> df[row:row+1]
     School Class Gender   Address  Height  Weight  Math Physics
ID                                                              
1102    S_1   C_1      F  street_2     192      73  32.5      B+

② 多行索引:

#用切片,如果是选取指定的某几行,推荐使用loc,否则很可能报错
>>> df[3:5]
     School Class Gender   Address  Height  Weight  Math Physics
ID                                                              
1104    S_1   C_1      F  street_2     167      81  80.4      B-
1105    S_1   C_1      F  street_4     159      64  84.8      B+

③ 单列索引:

>>> df['School'].head()
ID
1101    S_1
1102    S_1
1103    S_1
1104    S_1
1105    S_1
Name: School, dtype: object

④ 多列索引:

>>> df[['School','Math']].head()
     School  Math
ID               
1101    S_1  34.0
1102    S_1  32.5
1103    S_1  87.2
1104    S_1  80.4
1105    S_1  84.8

⑤函数式索引:

>>> df[lambda x:['Math','Physics']].head()
      Math Physics
ID                
1101  34.0      A+
1102  32.5      B+
1103  87.2      B+
1104  80.4      B-
1105  84.8      B+

⑥ 布尔索引:

>>> df[df['Gender']=='F'].head()
     School Class Gender   Address  Height  Weight  Math Physics
ID                                                              
1102    S_1   C_1      F  street_2     192      73  32.5      B+
1104    S_1   C_1      F  street_2     167      81  80.4      B-
1105    S_1   C_1      F  street_4     159      64  84.8      B+
1202    S_1   C_2      F  street_4     176      94  63.5      B-
1204    S_1   C_2      F  street_5     162      63  33.8       B

小节:一般来说,[]操作符常用于列选择或布尔选择,尽量避免行的选择

2. 布尔索引

(a)布尔符号:‘&’,‘|’,‘~’:分别代表和and,或or,取反not

>>> df[(df['Gender']=='F')&(df['Address']=='street_2')].head()
     School Class Gender   Address  Height  Weight  Math Physics
ID                                                              
1102    S_1   C_1      F  street_2     192      73  32.5      B+
1104    S_1   C_1      F  street_2     167      81  80.4      B-
2401    S_2   C_4      F  street_2     192      62  45.3       A
2404    S_2   C_4      F  street_2     160      84  67.7       B

>>> df[(df['Math']>85)|(df['Address']=='street_7')].head()
     School Class Gender   Address  Height  Weight  Math Physics
ID                                                              
1103    S_1   C_1      M  street_2     186      82  87.2      B+
1201    S_1   C_2      M  street_5     188      68  97.0      A-
1302    S_1   C_3      F  street_1     175      57  87.7      A-
1303    S_1   C_3      M  street_7     188      82  49.7       B
1304    S_1   C_3      M  street_2     195      70  85.2       A

>>> df[~((df['Math']>75)|(df['Address']=='street_1'))].head()
     School Class Gender   Address  Height  Weight  Math Physics
ID                                                              
1102    S_1   C_1      F  street_2     192      73  32.5      B+
1202    S_1   C_2      F  street_4     176      94  63.5      B-
1203    S_1   C_2      M  street_6     160      53  58.8      A+
1204    S_1   C_2      F  street_5     162      63  33.8       B
1205    S_1   C_2      F  street_6     167      63  68.4      B-

loc[]中相应位置都能使用布尔列表选择:

>>> df.loc[df['Math']>60,df.columns=='Physics'].head()
>>> #思考:为什么df.loc[df['Math']>60,(df[:8]['Address']=='street_6').values].head()得到和上述结果一样?values能去掉吗?
     Physics
ID          
1103      B+
1104      B-
1105      B+
1201      A-
1202      B-

(b) isin方法

>>> df[df['Address'].isin(['street_1','street_4'])&df['Physics'].isin(['A','A+'])]
     School Class Gender   Address  Height  Weight  Math Physics
ID                                                              
1101    S_1   C_1      M  street_1     173      63  34.0      A+
2105    S_2   C_1      M  street_4     170      81  34.2       A
2203    S_2   C_2      M  street_4     155      91  73.8      A+

>>> #上面也可以用字典方式写:
>>> df[df[['Address','Physics']].isin({'Address':['street_1','street_4'],'Physics':['A','A+']}).all(1)]
>>> #all与&的思路是类似的,其中的1代表按照跨列方向判断是否全为True
     School Class Gender   Address  Height  Weight  Math Physics
ID                                                              
1101    S_1   C_1      M  street_1     173      63  34.0      A+
2105    S_2   C_1      M  street_4     170      81  34.2       A
2203    S_2   C_2      M  street_4     155      91  73.8      A+

3. 快速标量索引

当只需要取一个元素时,atiat方法能够提供更快的实现:

>>> df.at[1101,'School']
'S_1'

>>> df.loc[1101,'School']
'S_1'

>>> df.iat[0,0]
'S_1'

>>> df.iloc[0,0]
'S_1'

>>> #可尝试去掉注释对比时间
>>> #%timeit df.at[1101,'School']
>>> #%timeit df.loc[1101,'School']
>>> #%timeit df.iat[0,0]
>>> #%timeit df.iloc[0,0]

4. 区间索引

此处介绍并不是说只能在单级索引中使用区间索引,只是作为一种特殊类型的索引方式,在此处先行介绍

(a)利用interval_range方法

>>> pd.interval_range(start=0,end=5)
>>> #closed参数可选'left''right''both''neither',默认左开右闭
IntervalIndex([(0, 1], (1, 2], (2, 3], (3, 4], (4, 5]],
              closed='right',
              dtype='interval[int64]')
              
>>> pd.interval_range(start=0,periods=8,freq=5)
>>> #periods参数控制区间个数,freq控制步长
IntervalIndex([(0, 5], (5, 10], (10, 15], (15, 20], (20, 25], (25, 30], (30, 35], (35, 40]],
              closed='right',
              dtype='interval[int64]')

(b)利用cut将数值列转为区间为元素的分类变量,例如统计数学成绩的区间情况:

>>> math_interval = pd.cut(df['Math'],bins=[0,40,60,80,100])
>>> #注意,如果没有类型转换,此时并不是区间类型,而是category类型
>>> math_interval.head()
ID
1101      (0, 40]
1102      (0, 40]
1103    (80, 100]
1104    (80, 100]
1105    (80, 100]
Name: Math, dtype: category
Categories (4, interval[int64]): [(0, 40] < (40, 60] < (60, 80] < (80, 100]]

(c)区间索引的选取

>>> df_i = df.join(math_interval,rsuffix='_interval')[['Math','Math_interval']]\
            .reset_index().set_index('Math_interval')
>>> df_i.head()
                 ID  Math
Math_interval            
(0, 40]        1101  34.0
(0, 40]        1102  32.5
(80, 100]      1103  87.2
(80, 100]      1104  80.4
(80, 100]      1105  84.8

>>> df_i.loc[65].head()
>>> #包含该值就会被选中
                 ID  Math
Math_interval            
(60, 80]       1202  63.5
(60, 80]       1205  68.4
(60, 80]       1305  61.7
(60, 80]       2104  72.2
(60, 80]       2202  68.5

>>> df_i.loc[[65,90]].head()
                 ID  Math
Math_interval            
(60, 80]       1202  63.5
(60, 80]       1205  68.4
(60, 80]       1305  61.7
(60, 80]       2104  72.2
(60, 80]       2202  68.5

如果想要选取某个区间,先要把分类变量转为区间变量,再使用overlap方法:

>>> #df_i.loc[pd.Interval(70,75)].head() 报错
>>> df_i[df_i.index.astype('interval').overlaps(pd.Interval(70, 85))].head()
                 ID  Math
Math_interval            
(80, 100]      1103  87.2
(80, 100]      1104  80.4
(80, 100]      1105  84.8
(80, 100]      1201  97.0
(60, 80]       1202  63.5

二、多级索引

1. 创建多级索引

(a)通过from_tuple或from_arrays

① 直接创建元组

>>> tuples = [('A','a'),('A','b'),('B','a'),('B','b')]
>>> mul_index = pd.MultiIndex.from_tuples(tuples, names=('Upper', 'Lower'))
>>> mul_index
MultiIndex([('A', 'a'),
            ('A', 'b'),
            ('B', 'a'),
            ('B', 'b')],
           names=['Upper', 'Lower'])

>>> pd.DataFrame({'Score':['perfect','good','fair','bad']},index=mul_index)
               Score
Upper Lower         
A     a      perfect
      b         good
B     a         fair
      b          bad

② 利用zip创建元组

>>> L1 = list('AABB')
>>> L2 = list('abab')
>>> tuples = list(zip(L1,L2))
>>> mul_index = pd.MultiIndex.from_tuples(tuples, names=('Upper', 'Lower'))
>>> pd.DataFrame({'Score':['perfect','good','fair','bad']},index=mul_index)
               Score
Upper Lower         
A     a      perfect
      b         good
B     a         fair
      b          bad

③ 通过Array创建

>>> arrays = [['A','a'],['A','b'],['B','a'],['B','b']]
>>> mul_index = pd.MultiIndex.from_tuples(arrays, names=('Upper', 'Lower'))
>>> pd.DataFrame({'Score':['perfect','good','fair','bad']},index=mul_index)
               Score
Upper Lower         
A     a      perfect
      b         good
B     a         fair
      b          bad
      
>>> mul_index
>>> #由此看出内部自动转成元组
MultiIndex([('A', 'a'),
            ('A', 'b'),
            ('B', 'a'),
            ('B', 'b')],
           names=['Upper', 'Lower'])

(b)通过from_product

>>> L1 = ['A','B']
>>> L2 = ['a','b']
>>> pd.MultiIndex.from_product([L1,L2],names=('Upper', 'Lower'))
>>> #两两相乘
MultiIndex([('A', 'a'),
            ('A', 'b'),
            ('B', 'a'),
            ('B', 'b')],
           names=['Upper', 'Lower'])

(c)指定df中的列创建(set_index方法)

>>> df_using_mul = df.set_index(['Class','Address'])
>>> df_using_mul.head()
               School Gender  Height  Weight  Math Physics
Class Address                                             
C_1   street_1    S_1      M     173      63  34.0      A+
      street_2    S_1      F     192      73  32.5      B+
      street_2    S_1      M     186      82  87.2      B+
      street_2    S_1      F     167      81  80.4      B-
      street_4    S_1      F     159      64  84.8      B+

2. 多层索引切片

>>> df_using_mul.head()
               School Gender  Height  Weight  Math Physics
Class Address                                             
C_1   street_1    S_1      M     173      63  34.0      A+
      street_2    S_1      F     192      73  32.5      B+
      street_2    S_1      M     186      82  87.2      B+
      street_2    S_1      F     167      81  80.4      B-
      street_4    S_1      F     159      64  84.8      B+

(a)一般切片

>>> #df_using_mul.loc['C_2','street_5']
>>> #当索引不排序时,单个索引会报出性能警告
>>> #df_using_mul.index.is_lexsorted()
>>> #该函数检查是否排序
>>> df_using_mul.sort_index().loc['C_2','street_5']
>>> #df_using_mul.sort_index().index.is_lexsorted()
               School Gender  Height  Weight  Math Physics
Class Address                                             
C_2   street_5    S_1      M     188      68  97.0      A-
      street_5    S_1      F     162      63  33.8       B
      street_5    S_2      M     193     100  39.1       B
      
>>> #df_using_mul.loc[('C_2','street_5'):] 报错
>>> #当不排序时,不能使用多层切片
>>> df_using_mul.sort_index().loc[('C_2','street_6'):('C_3','street_4')]
>>> #注意此处由于使用了loc,因此仍然包含右端点
               School Gender  Height  Weight  Math Physics
Class Address                                             
C_2   street_6    S_1      M     160      53  58.8      A+
      street_6    S_1      F     167      63  68.4      B-
      street_7    S_2      F     194      77  68.5      B+
      street_7    S_2      F     183      76  85.4       B
C_3   street_1    S_1      F     175      57  87.7      A-
      street_2    S_1      M     195      70  85.2       A
      street_4    S_1      M     161      68  31.5      B+
      street_4    S_2      F     157      78  72.3      B+
      street_4    S_2      M     187      73  48.9       B
      
>>> df_using_mul.sort_index().loc[('C_2','street_7'):'C_3'].head()
>>> #非元组也是合法的,表示选中该层所有元素
               School Gender  Height  Weight  Math Physics
Class Address                                             
C_2   street_7    S_2      F     194      77  68.5      B+
      street_7    S_2      F     183      76  85.4       B
C_3   street_1    S_1      F     175      57  87.7      A-
      street_2    S_1      M     195      70  85.2       A
      street_4    S_1      M     161      68  31.5      B+

(b)第一类特殊情况:由元组构成列表

>>> df_using_mul.sort_index().loc[[('C_2','street_7'),('C_3','street_2')]]
>>> #表示选出某几个元素,精确到最内层索引
               School Gender  Height  Weight  Math Physics
Class Address                                             
C_2   street_7    S_2      F     194      77  68.5      B+
      street_7    S_2      F     183      76  85.4       B
C_3   street_2    S_1      M     195      70  85.2       A

(c)第二类特殊情况:由列表构成元组

>>> df_using_mul.sort_index().loc[(['C_2','C_3'],['street_4','street_7']),:]
>>> #选出第一层在‘C_2’和'C_3'中且第二层在'street_4'和'street_7'中的行
               School Gender  Height  Weight  Math Physics
Class Address                                             
C_2   street_4    S_1      F     176      94  63.5      B-
      street_4    S_2      M     155      91  73.8      A+
      street_7    S_2      F     194      77  68.5      B+
      street_7    S_2      F     183      76  85.4       B
C_3   street_4    S_1      M     161      68  31.5      B+
      street_4    S_2      F     157      78  72.3      B+
      street_4    S_2      M     187      73  48.9       B
      street_7    S_1      M     188      82  49.7       B
      street_7    S_2      F     190      99  65.9       C

3. 多层索引中的slice对象

>>> L1,L2 = ['A','B','C'],['a','b','c']
>>> mul_index1 = pd.MultiIndex.from_product([L1,L2],names=('Upper', 'Lower'))
>>> L3,L4 = ['D','E','F'],['d','e','f']
>>> mul_index2 = pd.MultiIndex.from_product([L3,L4],names=('Big', 'Small'))
>>> df_s = pd.DataFrame(np.random.rand(9,9),index=mul_index1,columns=mul_index2)
>>> df_s
Big                 D                             E                             F                    
Small               d         e         f         d         e         f         d         e         f
Upper Lower                                                                                          
A     a      0.138404  0.755400  0.861702  0.540977  0.209592  0.528436  0.992099  0.258492  0.665842
      b      0.016727  0.918201  0.005230. 0.920703. 0.284818  0.746384  0.833577  0.865584  0.492445
      c      0.973899  0.295795  0.070411  0.542911  0.802148  0.705826  0.695886  0.266620  0.169622
B     a      0.136199  0.127682  0.456423  0.323732  0.293247  0.805086  0.226015  0.741182  0.191226
      b      0.688108  0.285718  0.674049  0.395258  0.814939  0.413188  0.308767  0.290622  0.500804
      c      0.145298  0.970690  0.175698  0.079383  0.480480  0.674522  0.376210  0.360039  0.421905
C     a      0.517342  0.261396  0.471768  0.483732  0.230302  0.126709  0.871482  0.601575  0.091868
      b      0.506630  0.347414  0.144214  0.709386  0.228000  0.965529  0.473915  0.570749  0.408741
      c      0.259413  0.282587  0.144029  0.585717  0.215044  0.811602  0.008216  0.074891  0.302157

>>> idx=pd.IndexSlice

索引Slice的使用非常灵活:

>>> df_s.loc[idx['B':,df_s['D']['d']>0.3],idx[df_s.sum()>4]]
>>> #df_s.sum()默认为对列求和,因此返回一个长度为9的数值列表
Big                 D         E                   F
Small               e         d         f         d         e
Upper Lower                                                                     
B     b      0.285718  0.395258  0.413188  0.308767  0.290622
C     a      0.261396  0.483732  0.126709  0.871482  0.601575
      b      0.347414  0.709386  0.965529  0.473915  0.570749

4. 索引层的交换

(a)swaplevel方法(两层交换)

>>> df_using_mul.head()
               School Gender  Height  Weight  Math Physics
Class Address                                             
C_1   street_1    S_1      M     173      63  34.0      A+
      street_2    S_1      F     192      73  32.5      B+
      street_2    S_1      M     186      82  87.2      B+
      street_2    S_1      F     167      81  80.4      B-
      street_4    S_1      F     159      64  84.8      B+
      
>>> df_using_mul.swaplevel(i=1,j=0,axis=0).sort_index().head()
               School Gender  Height  Weight  Math Physics
Address  Class                                            
street_1 C_1      S_1      M     173      63  34.0      A+
         C_2      S_2      M     175      74  47.2      B-
         C_3      S_1      F     175      57  87.7      A-
street_2 C_1      S_1      F     192      73  32.5      B+
         C_1      S_1      M     186      82  87.2      B+

(b)reorder_levels方法(多层交换)

>>> df_muls = df.set_index(['School','Class','Address'])
>>> df_muls.head()
                      Gender  Height  Weight  Math Physics
School Class Address                                      
S_1    C_1   street_1      M     173      63  34.0      A+
             street_2      F     192      73  32.5      B+
             street_2      M     186      82  87.2      B+
             street_2      F     167      81  80.4      B-
             street_4      F     159      64  84.8      B+

>>> df_muls.reorder_levels([2,0,1],axis=0).sort_index().head()
                      Gender  Height  Weight  Math Physics
Address  School Class                                     
street_1 S_1    C_1        M     173      63  34.0      A+
                C_3        F     175      57  87.7      A-
         S_2    C_2        M     175      74  47.2      B-
street_2 S_1    C_1        F     192      73  32.5      B+
                C_1        M     186      82  87.2      B+

>>> #如果索引有name,可以直接使用name
>>> df_muls.reorder_levels(['Address','School','Class'],axis=0).sort_index().head()
                      Gender  Height  Weight  Math Physics
Address  School Class                                     
street_1 S_1    C_1        M     173      63  34.0      A+
                C_3        F     175      57  87.7      A-
         S_2    C_2        M     175      74  47.2      B-
street_2 S_1    C_1        F     192      73  32.5      B+
                C_1        M     186      82  87.2      B+

三、索引设定

1. index_col参数

index_colread_csv中的一个参数,而不是某一个方法:

>>> pd.read_csv('data/table.csv',index_col=['Address','School']).head()
                Class    ID Gender  Height  Weight  Math Physics
Address  School                                                 
street_1 S_1      C_1  1101      M     173      63  34.0      A+
street_2 S_1      C_1  1102      F     192      73  32.5      B+
         S_1      C_1  1103      M     186      82  87.2      B+
         S_1      C_1  1104      F     167      81  80.4      B-
street_4 S_1      C_1  1105      F     159      64  84.8      B+

2. reindex和reindex_like

reindex是指重新索引,它的重要特性在于索引对齐,很多时候用于重新排序

>>> df.head()
     School Class Gender   Address  Height  Weight  Math Physics
ID                                                              
1101    S_1   C_1      M  street_1     173      63  34.0      A+
1102    S_1   C_1      F  street_2     192      73  32.5      B+
1103    S_1   C_1      M  street_2     186      82  87.2      B+
1104    S_1   C_1      F  street_2     167      81  80.4      B-
1105    S_1   C_1      F  street_4     159      64  84.8      B+

>>> df.reindex(index=[1101,1203,1206,2402])
     School Class Gender   Address  Height  Weight  Math Physics
ID                                                              
1101    S_1   C_1      M  street_1   173.0    63.0  34.0      A+
1203    S_1   C_2      M  street_6   160.0    53.0  58.8      A+
1206    NaN   NaN    NaN       NaN     NaN     NaN   NaN     NaN
2402    S_2   C_4      M  street_7   166.0    82.0  48.7       B

>>> df.reindex(columns=['Height','Gender','Average']).head()
      Height Gender  Average
ID                          
1101     173      M      NaN
1102     192      F      NaN
1103     186      M      NaN
1104     167      F      NaN
1105     159      F      NaN

可以选择缺失值的填充方法:fill_valuemethodbfill/ffill/nearest),其中method参数必须索引单调

>>> df.reindex(index=[1101,1203,1206,2402],method='bfill')
>>> #bfill表示用所在索引1206的后一个有效行填充,ffill为前一个有效行,nearest是指最近的
     School Class Gender   Address  Height  Weight  Math Physics
ID                                                              
1101    S_1   C_1      M  street_1     173      63  34.0      A+
1203    S_1   C_2      M  street_6     160      53  58.8      A+
1206    S_1   C_3      M  street_4     161      68  31.5      B+
2402    S_2   C_4      M  street_7     166      82  48.7       B

>>> df.reindex(index=[1101,1203,1206,2402],method='nearest')
>>> #数值上1205比1301更接近1206,因此用前者填充
     School Class Gender   Address  Height  Weight  Math Physics
ID                                                              
1101    S_1   C_1      M  street_1     173      63  34.0      A+
1203    S_1   C_2      M  street_6     160      53  58.8      A+
1206    S_1   C_2      F  street_6     167      63  68.4      B-
2402    S_2   C_4      M  street_7     166      82  48.7       B

reindex_like的作用为生成一个横纵索引完全与参数列表一致的DataFrame,数据使用被调用的表

>>> df_temp = pd.DataFrame({'Weight':np.zeros(5),
                        'Height':np.zeros(5),
                        'ID':[1101,1104,1103,1106,1102]}).set_index('ID')
>>> df_temp.reindex_like(df[0:5][['Weight','Height']])
      Weight  Height
ID                  
1101     0.0     0.0
1102     0.0     0.0
1103     0.0     0.0
1104     0.0     0.0
1105     NaN     NaN

如果df_temp单调还可以使用method参数:

>>> df_temp = pd.DataFrame({'Weight':range(5),
                        'Height':range(5),
                        'ID':[1101,1104,1103,1106,1102]}).set_index('ID').sort_index()
>>> df_temp.reindex_like(df[0:5][['Weight','Height']],method='bfill')
>>> #可以自行检验这里的1105的值是否是由bfill规则填充
      Weight  Height
ID                  
1101       0       0
1102       4       4
1103       2       2
1104       1       1
1105       3       3

3. set_index和reset_index

先介绍set_index:从字面意思看,就是将某些列作为索引
使用表内列作为索引:

>>> df.head()
     School Class Gender   Address  Height  Weight  Math Physics
ID                                                              
1101    S_1   C_1      M  street_1     173      63  34.0      A+
1102    S_1   C_1      F  street_2     192      73  32.5      B+
1103    S_1   C_1      M  street_2     186      82  87.2      B+
1104    S_1   C_1      F  street_2     167      81  80.4      B-
1105    S_1   C_1      F  street_4     159      64  84.8      B+

>>> df.set_index('Class').head()
      School Gender   Address  Height  Weight  Math Physics
Class                                                      
C_1      S_1      M  street_1     173      63  34.0      A+
C_1      S_1      F  street_2     192      73  32.5      B+
C_1      S_1      M  street_2     186      82  87.2      B+
C_1      S_1      F  street_2     167      81  80.4      B-
C_1      S_1      F  street_4     159      64  84.8      B+

利用append参数可以将当前索引维持不变

>>> df.set_index('Class',append=True).head()
           School Gender   Address  Height  Weight  Math Physics
ID   Class                                                      
1101 C_1      S_1      M  street_1     173      63  34.0      A+
1102 C_1      S_1      F  street_2     192      73  32.5      B+
1103 C_1      S_1      M  street_2     186      82  87.2      B+
1104 C_1      S_1      F  street_2     167      81  80.4      B-
1105 C_1      S_1      F  street_4     159      64  84.8      B+

当使用与表长相同的列作为索引(需要先转化为Series,否则报错):

>>> df.set_index(pd.Series(range(df.shape[0]))).head()
  School Class Gender   Address  Height  Weight  Math Physics
0    S_1   C_1      M  street_1     173      63  34.0      A+
1    S_1   C_1      F  street_2     192      73  32.5      B+
2    S_1   C_1      M  street_2     186      82  87.2      B+
3    S_1   C_1      F  street_2     167      81  80.4      B-
4    S_1   C_1      F  street_4     159      64  84.8      B+

可以直接添加多级索引:

>>> df.set_index([pd.Series(range(df.shape[0])),pd.Series(np.ones(df.shape[0]))]).head()
      School Class Gender   Address  Height  Weight  Math Physics
0 1.0    S_1   C_1      M  street_1     173      63  34.0      A+
1 1.0    S_1   C_1      F  street_2     192      73  32.5      B+
2 1.0    S_1   C_1      M  street_2     186      82  87.2      B+
3 1.0    S_1   C_1      F  street_2     167      81  80.4      B-
4 1.0    S_1   C_1      F  street_4     159      64  84.8      B+

下面介绍reset_index方法,它的主要功能是将索引重置
默认状态直接恢复到自然数索引:

>>> df.reset_index().head()
     ID School Class Gender   Address  Height  Weight  Math Physics
0  1101    S_1   C_1      M  street_1     173      63  34.0      A+
1  1102    S_1   C_1      F  street_2     192      73  32.5      B+
2  1103    S_1   C_1      M  street_2     186      82  87.2      B+
3  1104    S_1   C_1      F  street_2     167      81  80.4      B-
4  1105    S_1   C_1      F  street_4     159      64  84.8      B+

level参数指定哪一层被reset,用col_level参数指定set到哪一层:

>>> L1,L2 = ['A','B','C'],['a','b','c']
>>> mul_index1 = pd.MultiIndex.from_product([L1,L2],names=('Upper', 'Lower'))
>>> L3,L4 = ['D','E','F'],['d','e','f']
>>> mul_index2 = pd.MultiIndex.from_product([L3,L4],names=('Big', 'Small'))
>>> df_temp = pd.DataFrame(np.random.rand(9,9),index=mul_index1,columns=mul_index2)
>>> df_temp.head()
Big                 D                             E                             F                    
Small               d         e         f         d         e         f         d         e         f
Upper Lower                                                                                          
A     a      0.952394  0.225359  0.088530  0.344681  0.803563  0.957546  0.649799  0.644266  0.533074
      b      0.597974  0.170867  0.713686  0.045497  0.013743  0.609154  0.642097  0.520484  0.113203
      c      0.827619  0.999765  0.431732  0.060878  0.402561  0.183220  0.567167  0.822338  0.383110
B     a      0.820653  0.562288  0.387569  0.310252  0.876846  0.525805  0.786711  0.638720  0.450830
      b      0.610984  0.052675  0.056680  0.815168  0.933346  0.712698  0.042167  0.459474  0.934926

>>> df_temp1 = df_temp.reset_index(level=1,col_level=1)
>>> df_temp1.head()
Big                 D                             E                             F                    
Small Lower         d         e         f         d         e         f         d         e         f
Upper                                                                                                
A         a  0.952394  0.225359  0.088530  0.344681  0.803563  0.957546  0.649799  0.644266  0.533074
A         b  0.597974  0.170867  0.713686  0.045497  0.013743  0.609154  0.642097  0.520484  0.113203
A         c  0.827619  0.999765  0.431732  0.060878  0.402561  0.183220  0.567167  0.822338  0.383110
B         a  0.820653  0.562288  0.387569  0.310252  0.876846  0.525805  0.786711  0.638720  0.450830
B         b  0.610984  0.052675  0.056680  0.815168  0.933346  0.712698  0.042167  0.459474  0.934926

>>> df_temp1.columns
>>> #看到的确插入了level2
MultiIndex([( '', 'Lower'),
            ('D',     'd'),
            ('D',     'e'),
            ('D',     'f'),
            ('E',     'd'),
            ('E',     'e'),
            ('E',     'f'),
            ('F',     'd'),
            ('F',     'e'),
            ('F',     'f')],
           names=['Big', 'Small'])

>>> df_temp1.index
>>> #最内层索引被移出
Index(['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'], dtype='object', name='Upper')

4. rename_axis和rename

rename_axis是针对多级索引的方法,作用是修改某一层的索引名,而不是索引标签

>>> df_temp.rename_axis(index={'Lower':'LowerLower'},columns={'Big':'BigBig'})
BigBig                   D                             E                             F                    
Small                    d         e         f         d         e         f         d         e         f
Upper LowerLower                                                                                          
A     a           0.952394  0.225359  0.088530  0.344681  0.803563   0.957546  0.649799 0.644266   0.533074
      b           0.597974  0.170867  0.713686  0.045497  0.013743   0.609154  0.642097 0.520484   0.113203
      c           0.827619  0.999765  0.431732  0.060878  0.402561   0.183220  0.567167 0.822338   0.383110
B     a           0.820653  0.562288  0.387569  0.310252  0.876846   0.525805  0.786711 0.638720   0.450830
      b           0.610984  0.052675  0.056680  0.815168  0.933346   0.712698  0.042167 0.459474   0.934926
      c           0.817660  0.031609  0.991720  0.314334  0.940107   0.901928  0.565043 0.781750   0.915386
C     a           0.649918  0.652024  0.843319  0.007445  0.494126   0.674986  0.569380 0.133073   0.870157
      b           0.587731  0.493679  0.574484  0.847679  0.512082   0.361565  0.818315 0.447201   0.065062
      c           0.681552  0.829670  0.851267  0.889587  0.543569   0.889665  0.720163 0.081832   0.978681

rename方法用于修改列或者行索引标签,而不是索引名:

>>> df_temp.rename(index={'A':'T'},columns={'e':'changed_e'}).head()
Big                 D                             E                             F                    
Small               d changed_e         f         d changed_e         f         d changed_e         f
Upper Lower                                                                                          
T     a      0.952394  0.225359  0.088530  0.344681  0.803563  0.957546  0.649799  0.644266  0.533074
      b      0.597974  0.170867  0.713686  0.045497  0.013743  0.609154  0.642097  0.520484  0.113203
      c      0.827619  0.999765  0.431732  0.060878  0.402561  0.183220  0.567167  0.822338  0.383110
B     a      0.820653  0.562288  0.387569  0.310252  0.876846  0.525805  0.786711  0.638720  0.450830
      b      0.610984  0.052675  0.056680  0.815168  0.933346  0.712698  0.042167  0.459474  0.934926

四、常用索引型函数

1. where函数

当对条件为False的单元进行填充:

>>> df.head()
     School Class Gender   Address  Height  Weight  Math Physics
ID                                                              
1101    S_1   C_1      M  street_1     173      63  34.0      A+
1102    S_1   C_1      F  street_2     192      73  32.5      B+
1103    S_1   C_1      M  street_2     186      82  87.2      B+
1104    S_1   C_1      F  street_2     167      81  80.4      B-
1105    S_1   C_1      F  street_4     159      64  84.8      B+

>>> df.where(df['Gender']=='M').head()
>>> #不满足条件的行全部被设置为NaN
     School Class Gender   Address  Height  Weight  Math Physics
ID                                                              
1101    S_1   C_1      M  street_1   173.0    63.0  34.0      A+
1102    NaN   NaN    NaN       NaN     NaN     NaN   NaN     NaN
1103    S_1   C_1      M  street_2   186.0    82.0  87.2      B+
1104    NaN   NaN    NaN       NaN     NaN     NaN   NaN     NaN
1105    NaN   NaN    NaN       NaN     NaN     NaN   NaN     NaN

通过这种方法筛选结果和[]操作符的结果完全一致:

>>> df.where(df['Gender']=='M').dropna().head()
     School Class Gender   Address  Height  Weight  Math Physics
ID                                                              
1101    S_1   C_1      M  street_1   173.0    63.0  34.0      A+
1103    S_1   C_1      M  street_2   186.0    82.0  87.2      B+
1201    S_1   C_2      M  street_5   188.0    68.0  97.0      A-
1203    S_1   C_2      M  street_6   160.0    53.0  58.8      A+
1301    S_1   C_3      M  street_4   161.0    68.0  31.5      B+

第一个参数为布尔条件,第二个参数为填充值:

>>> df.where(df['Gender']=='M',np.random.rand(df.shape[0],df.shape[1])).head()
        School     Class    Gender   Address      Height     Weight       Math   Physics
ID                                                                                      
1101       S_1       C_1         M  street_1  173.000000  63.000000  34.000000        A+
1102  0.880363  0.377656  0.441071  0.192081    0.596748   0.693048   0.809448   0.41425
1103       S_1       C_1         M  street_2  186.000000  82.000000  87.200000        B+
1104  0.432909  0.660837   0.90067   0.93032    0.099089   0.449954   0.426169  0.082895
1105  0.540073   0.68175  0.262715  0.336918    0.714834   0.642204   0.956307  0.465849

2. mask函数

mask函数与where功能上相反,其余完全一致,即对条件为True的单元进行填充

>>> df.mask(df['Gender']=='M').dropna().head()
     School Class Gender   Address  Height  Weight  Math Physics
ID                                                              
1102    S_1   C_1      F  street_2   192.0    73.0  32.5      B+
1104    S_1   C_1      F  street_2   167.0    81.0  80.4      B-
1105    S_1   C_1      F  street_4   159.0    64.0  84.8      B+
1202    S_1   C_2      F  street_4   176.0    94.0  63.5      B-
1204    S_1   C_2      F  street_5   162.0    63.0  33.8       B

>>> df.mask(df['Gender']=='M',np.random.rand(df.shape[0],df.shape[1])).head()
        School     Class     Gender   Address      Height     Weight       Math   Physics
ID                                                                                     
1101  0.273904  0.460798   0.446225  0.633699    0.895552   0.296590   0.002112  0.222349
1102       S_1	     C_1          F  street_2  192.000000  73.000000  32.500000        B+
1103  0.266646  0.567703  0.0981018  0.625369    0.876915   0.405576   0.508490  0.203879
1104       S_1       C_1          F  street_2  167.000000  81.000000  80.400000        B-
1105       S_1       C_1          F  street_4  159.000000  64.000000  84.800000        B+

3. query函数

>>> df.head()
     School Class Gender   Address  Height  Weight  Math Physics
ID                                                              
1101    S_1   C_1      M  street_1     173      63  34.0      A+
1102    S_1   C_1      F  street_2     192      73  32.5      B+
1103    S_1   C_1      M  street_2     186      82  87.2      B+
1104    S_1   C_1      F  street_2     167      81  80.4      B-
1105    S_1   C_1      F  street_4     159      64  84.8      B+

query函数中的布尔表达式中,下面的符号都是合法的:行列索引名、字符串、and/not/or/&/|/~/not in/in/==/!=、四则运算符

>>> df.query('(Address in ["street_6","street_7"])&(Weight>(70+10))&(ID in [1303,2304,2402])')
     School Class Gender   Address  Height  Weight  Math Physics
ID                                                              
1303    S_1   C_3      M  street_7     188      82  49.7       B
2304    S_2   C_3      F  street_6     164      81  95.5      A-
2402    S_2   C_4      M  street_7     166      82  48.7       B

五、重复元素处理

1. duplicated方法

该方法返回了是否重复的布尔列表

>>> df.duplicated('Class').head()
ID
1101    False
1102     True
1103     True
1104     True
1105     True
dtype: bool

可选参数keep默认为first,即首次出现设为不重复,若为last,则最后一次设为不重复,若为False,则所有重复项为True

>>> df.duplicated('Class',keep='last').tail()
ID
2401     True
2402     True
2403     True
2404     True
2405    False
dtype: bool

>>> df.duplicated('Class',keep=False).head()
ID
1101    True
1102    True
1103    True
1104    True
1105    True
dtype: bool

2. drop_duplicates方法

从名字上看出为剔除重复项,这在后面章节中的分组操作中可能是有用的,例如需要保留每组的第一个值:

>>> df.drop_duplicates('Class')
     School Class Gender   Address  Height  Weight  Math Physics
ID                                                              
1101    S_1   C_1      M  street_1     173      63  34.0      A+
1201    S_1   C_2      M  street_5     188      68  97.0      A-
1301    S_1   C_3      M  street_4     161      68  31.5      B+
2401    S_2   C_4      F  street_2     192      62  45.3       A

参数与duplicate函数类似:

>>> df.drop_duplicates('Class',keep='last')
     School Class Gender   Address  Height  Weight  Math Physics
ID                                                              
2105    S_2   C_1      M  street_4     170      81  34.2       A
2205    S_2   C_2      F  street_7     183      76  85.4       B
2305    S_2   C_3      M  street_4     187      73  48.9       B
2405    S_2   C_4      F  street_6     193      54  47.6       B

在传入多列时等价于将多列共同视作一个多级索引,比较重复项:

>>> df.drop_duplicates(['School','Class'])
     School Class Gender   Address  Height  Weight  Math Physics
ID                                                              
1101    S_1   C_1      M  street_1     173      63  34.0      A+
1201    S_1   C_2      M  street_5     188      68  97.0      A-
1301    S_1   C_3      M  street_4     161      68  31.5      B+
2101    S_2   C_1      M  street_7     174      84  83.3       C
2201    S_2   C_2      M  street_5     193     100  39.1       B
2301    S_2   C_3      F  street_4     157      78  72.3      B+
2401    S_2   C_4      F  street_2     192      62  45.3       A

六、抽样函数

这里的抽样函数指的就是sample函数

(a)n为样本量

>>> df.sample(n=5)
     School Class Gender   Address  Height  Weight  Math Physics
ID                                                              
1201    S_1   C_2      M  street_5     188      68  97.0      A-
1202    S_1   C_2      F  street_4     176      94  63.5      B-
1302    S_1   C_3      F  street_1     175      57  87.7      A-
1205    S_1   C_2      F  street_6     167      63  68.4      B-
2202    S_2   C_2      F  street_7     194      77  68.5      B+

(b)frac为抽样比

>>> df.sample(frac=0.05)
     School Class Gender   Address  Height  Weight  Math Physics
ID                                                              
2303    S_2   C_3      F  street_7     190      99  65.9       C
1301    S_1   C_3      M  street_4     161      68  31.5      B+

(c)replace为是否放回

>>> df.sample(n=df.shape[0],replace=True).head()
     School Class Gender   Address  Height  Weight  Math Physics
ID                                                              
2205    S_2   C_2      F  street_7     183      76  85.4       B
1202	S_1   C_2      F  street_4     176      94  63.5      B-
1301	S_1   C_3      M  street_4     161      68  31.5      B+
1303	S_1   C_3      M  street_7     188      82  49.7       B
2402	S_2   C_4      M  street_7     166      82  48.7       B

>>> df.sample(n=35,replace=True).index.is_unique
False

(d)axis为抽样维度,默认为0,即抽行

>>> df.sample(n=3,axis=1).head()
       Address  Height  Math
ID 
1101  street_1     173  34.0
1102  street_2     192  32.5
1103  street_2     186  87.2
1104  street_2     167  80.4
1105  street_4     159  84.8

(e)weights为样本权重,自动归一化

>>> df.sample(n=3,weights=np.random.rand(df.shape[0])).head()
     School Class Gender   Address  Height  Weight  Math Physics
ID								
1101    S_1   C_1      M  street_1     173      63  34.0      A+
2402    S_2   C_4      M  street_7     166      82  48.7       B
1201    S_1   C_2      M  street_5     188      68  97.0      A-

>>> #以某一列为权重,这在抽样理论中很常见
>>> #抽到的概率与Math数值成正比
>>> df.sample(n=3,weights=df['Math']).head()
     School Class Gender   Address  Height  Weight  Math Physics
ID                                                              							
2404    S_2  C_4       F   street_2    160      84  67.7       B
2304    S_2  C_3       F   street_6    164      81  95.5      A-
1101    S_1  C_1       M   street_1    173      63  34.0      A+

相关文章
Python Pandas 第1章 基础
Python Pandas 第2章 索引
Python Pandas 第3章 分组
Python Pandas 第4章 变形
Python Pandas 第5章 合并
Python Pandas 第6章 缺少数据
Python Pandas 第7章 文本数据
Python Pandas 第8章 分类数据
Python Pandas 第9章 时序数据

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值