task2 pandas_索引

最新推荐文章于 2020-12-23 15:47:06 发布

sunshare77

最新推荐文章于 2020-12-23 15:47:06 发布

阅读量228

点赞数

本文链接：https://blog.csdn.net/sunshare77/article/details/105740628

版权

本文详细介绍了Pandas中索引的使用，包括单级索引的loc, iloc, []操作符，布尔索引，区间索引，以及多级索引的创建、切片和索引层交换。还涵盖了设置和重置索引，常用索引函数如where, mask和query，以及处理重复元素的方法。通过实例展示了各种操作，帮助理解Pandas索引的灵活性和强大功能。" 51426318,2947967,Android自定义View：炫酷壁纸更换与水瓶加水进度展示,"['Android开发', '自定义View', '动画', '用户体验', 'UI设计']

摘要由CSDN通过智能技术生成

第2章索引

In [1]:

import numpy as np

import pandas as pd

df = pd.read_csv('data/table.csv',index_col='ID')

df.head()

Out[1]:

	School	Class	Gender	Address	Height	Weight	Math	Physics
ID
1101	S_1	C_1	M	street_1	173	63	34.0	A+
1102	S_1	C_1	F	street_2	192	73	32.5	B+
1103	S_1	C_1	M	street_2	186	82	87.2	B+
1104	S_1	C_1	F	street_2	167	81	80.4	B-
1105	S_1	C_1	F	street_4	159	64	84.8	B+

一、单级索引

1. loc方法、iloc方法、[]操作符

最常用的索引方法可能就是这三类，其中iloc表示位置索引，loc表示标签索引，[]也具有很大的便利性，各有特点

（a）loc方法（注意：所有在loc中使用的切片全部包含右端点！）

① 单行索引：

In [2]:

df.loc[1103]

Out[2]:

School          S_1
Class           C_1
Gender            M
Address    street_2
Height          186
Weight           82
Math           87.2
Physics          B+
Name: 1103, dtype: object

② 多行索引：

In [3]:

df.loc[[1102,2304]]

Out[3]:

	School	Class	Gender	Address	Height	Weight	Math	Physics
ID
1102	S_1	C_1	F	street_2	192	73	32.5	B+
2304	S_2	C_3	F	street_6	164	81	95.5	A-

In [4]:

df.loc[1304:].head()

Out[4]:

	School	Class	Gender	Address	Height	Weight	Math	Physics
ID
1304	S_1	C_3	M	street_2	195	70	85.2	A
1305	S_1	C_3	F	street_5	187	69	61.7	B-
2101	S_2	C_1	M	street_7	174	84	83.3	C
2102	S_2	C_1	F	street_6	161	61	50.6	B+
2103	S_2	C_1	M	street_4	157	61	52.5	B-

In [5]:

df.loc[2402::-1].head()

Out[5]:

	School	Class	Gender	Address	Height	Weight	Math	Physics
ID
2402	S_2	C_4	M	street_7	166	82	48.7	B
2401	S_2	C_4	F	street_2	192	62	45.3	A
2305	S_2	C_3	M	street_4	187	73	48.9	B
2304	S_2	C_3	F	street_6	164	81	95.5	A-
2303	S_2	C_3	F	street_7	190	99	65.9	C

③ 单列索引：

In [6]:

df.loc[:,'Height'].head()

Out[6]:

ID
1101    173
1102    192
1103    186
1104    167
1105    159
Name: Height, dtype: int64

④ 多列索引：

In [7]:

df.loc[:,['Height','Math']].head()

Out[7]:

	Height	Math
ID
1101	173	34.0
1102	192	32.5
1103	186	87.2
1104	167	80.4
1105	159	84.8

In [8]:

df.loc[:,'Height':'Math'].head()

Out[8]:

	Height	Weight	Math
ID
1101	173	63	34.0
1102	192	73	32.5
1103	186	82	87.2
1104	167	81	80.4
1105	159	64	84.8

⑤ 联合索引：

In [9]:

df.loc[1102:2401:3,'Height':'Math'].head()

Out[9]:

	Height	Weight	Math
ID
1102	192	73	32.5
1105	159	64	84.8
1203	160	53	58.8
1301	161	68	31.5
1304	195	70	85.2

⑥ 函数式索引：

In [10]:

df.loc[lambda x:x['Gender']=='M'].head()

#loc中使用的函数，传入参数就是前面的df

Out[10]:

	School	Class	Gender	Address	Height	Weight	Math	Physics
ID
1101	S_1	C_1	M	street_1	173	63	34.0	A+
1103	S_1	C_1	M	street_2	186	82	87.2	B+
1201	S_1	C_2	M	street_5	188	68	97.0	A-
1203	S_1	C_2	M	street_6	160	53	58.8	A+
1301	S_1	C_3	M	street_4	161	68	31.5	B+

In [11]:

def f(x):

    return [1101,1103]

df.loc[f]

Out[11]:

	School	Class	Gender	Address	Height	Weight	Math	Physics
ID
1101	S_1	C_1	M	street_1	173	63	34.0	A+
1103	S_1	C_1	M	street_2	186	82	87.2	B+

⑦ 布尔索引（将重点在第2节介绍）

In [12]:

df.loc[df['Address'].isin(['street_7','street_4'])].head()

Out[12]:

	School	Class	Gender	Address	Height	Weight	Math	Physics
ID
1105	S_1	C_1	F	street_4	159	64	84.8	B+
1202	S_1	C_2	F	street_4	176	94	63.5	B-
1301	S_1	C_3	M	street_4	161	68	31.5	B+
1303	S_1	C_3	M	street_7	188	82	49.7	B
2101	S_2	C_1	M	street_7	174	84	83.3	C

In [13]:

df.loc[[True if i[-1]=='4' or i[-1]=='7' else False for i in df['Address'].values]].head()

Out[13]:

	School	Class	Gender	Address	Height	Weight	Math	Physics
ID
1105	S_1	C_1	F	street_4	159	64	84.8	B+
1202	S_1	C_2	F	street_4	176	94	63.5	B-
1301	S_1	C_3	M	street_4	161	68	31.5	B+
1303	S_1	C_3	M	street_7	188	82	49.7	B
2101	S_2	C_1	M	street_7	174	84	83.3	C

小节：本质上说，loc中能传入的只有布尔列表和索引子集构成的列表，只要把握这个原则就很容易理解上面那些操作

（b）iloc方法（注意与loc不同，切片右端点不包含）

① 单行索引：

In [14]:

df.iloc[3]

Out[14]:

School          S_1
Class           C_1
Gender            F
Address    street_2
Height          167
Weight           81
Math           80.4
Physics          B-
Name: 1104, dtype: object

② 多行索引：

In [15]:

df.iloc[3:5]

Out[15]:

	School	Class	Gender	Address	Height	Weight	Math	Physics
ID
1104	S_1	C_1	F	street_2	167	81	80.4	B-
1105	S_1	C_1	F	street_4	159	64	84.8	B+

③ 单列索引：

In [16]:

df.iloc[:,3].head()

Out[16]:

ID
1101    street_1
1102    street_2
1103    street_2
1104    street_2
1105    street_4
Name: Address, dtype: object

④ 多列索引：

In [17]:

df.iloc[:,7::-2].head()

Out[17]:

	Physics	Weight	Address	Class
ID
1101	A+	63	street_1	C_1
1102	B+	73	street_2	C_1
1103	B+	82	street_2	C_1
1104	B-	81	street_2	C_1
1105	B+	64	street_4	C_1

⑤ 混合索引：

In [18]:

df.iloc[3::4,7::-2].head()

Out[18]:

	Physics	Weight	Address	Class
ID
1104	B-	81	street_2	C_1
1203	A+	53	street_6	C_2
1302	A-	57	street_1	C_3
2101	C	84	street_7	C_1
2105	A	81	street_4	C_1

⑥ 函数式索引：

In [19]:

df.iloc[lambda x:[3]].head()

Out[19]:

	School	Class	Gender	Address	Height	Weight	Math	Physics
ID
1104	S_1	C_1	F	street_2	167	81	80.4	B-

小节：由上所述，iloc中接收的参数只能为整数或整数列表，不能使用布尔索引

（c） []操作符

如果不想陷入困境，请不要在行索引为浮点时使用[]操作符，因为在Series中的浮点[]并不是进行位置比较，而是值比较，非常特殊

（c.1）Series的[]操作

① 单元素索引：

In [20]:

s = pd.Series(df['Math'],index=df.index)

s[1101]

#使用的是索引标签

Out[20]:

34.0

② 多行索引：

In [21]:

s[0:4]

#使用的是绝对位置的整数切片，与元素无关，这里容易混淆

Out[21]:

ID
1101    34.0
1102    32.5
1103    87.2
1104    80.4
Name: Math, dtype: float64

③ 函数式索引：

In [22]:

s[lambda x: x.index[16::-6]]

#注意使用lambda函数时，直接切片(如：s[lambda x: 16::-6])就报错，此时使用的不是绝对位置切片，而是元素切片，非常易错

Out[22]:

ID
2102    50.6
1301    31.5
1105    84.8
Name: Math, dtype: float64

④ 布尔索引：

In [23]:

s[s>80]

Out[23]:

ID
1103    87.2
1104    80.4
1105    84.8
1201    97.0
1302    87.7
1304    85.2
2101    83.3
2205    85.4
2304    95.5
Name: Math, dtype: float64

（c.2）DataFrame的[]操作

① 单行索引：

In [24]:

df[1:2]

#这里非常容易写成df['label']，会报错

#同Series使用了绝对位置切片

#如果想要获得某一个元素，可用如下get_loc方法：

Out[24]:

	School	Class	Gender	Address	Height	Weight	Math	Physics
ID
1102	S_1	C_1	F	street_2	192	73	32.5	B+

In [25]:

row = df.index.get_loc(1102)

df[row:row+1]

Out[25]:

	School	Class	Gender	Address	Height	Weight	Math	Physics
ID
1102	S_1	C_1	F	street_2	192	73	32.5	B+

② 多行索引：

In [26]:

#用切片，如果是选取指定的某几行，推荐使用loc，否则很可能报错

df[3:5]

Out[26]:

	School	Class	Gender	Address	Height	Weight	Math	Physics
ID
1104	S_1	C_1	F	street_2	167	81	80.4	B-
1105	S_1	C_1	F	street_4	159	64	84.8	B+

③ 单列索引：

In [27]:

df['School'].head()

Out[27]:

ID
1101    S_1
1102    S_1
1103    S_1
1104    S_1
1105    S_1
Name: School, dtype: object

④ 多列索引：

In [28]:

df[['School','Math']].head()

Out[28]:

	School	Math
ID
1101	S_1	34.0
1102	S_1	32.5
1103	S_1	87.2
1104	S_1	80.4
1105	S_1	84.8

⑤函数式索引：

In [29]:

df[lambda x:['Math','Physics']].head()

Out[29]:

	Math	Physics
ID
1101	34.0	A+
1102	32.5	B+
1103	87.2	B+
1104	80.4	B-
1105	84.8	B+

⑥ 布尔索引：

In [30]:

df[df['Gender']=='F'].head()

Out[30]:

	School	Class	Gender	Address	Height	Weight	Math	Physics
ID
1102	S_1	C_1	F	street_2	192	73	32.5	B+
1104	S_1	C_1	F	street_2	167	81	80

最低0.47元/天解锁文章

sunshare77

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

task2 pandas_索引

第2章 索引

一、单级索引

1. loc方法、iloc方法、[]操作符

第2章索引