Pandas基础2.1｜Python学习笔记

最新推荐文章于 2024-03-08 23:49:01 发布

PenguinAsHeathen

最新推荐文章于 2024-03-08 23:49:01 发布

阅读量152

点赞数

分类专栏： Python学习笔记文章标签： python

本文链接：https://blog.csdn.net/m0_46384386/article/details/105716771

版权

Python学习笔记专栏收录该内容

32 篇文章 0 订阅

订阅专栏

import numpy as np
import pandas as pd

df = pd.read_csv('./data/table.csv',index_col='ID')
df

	Unnamed: 0	School	Class	Gender	Address	Height	Weight	Math	Physics
ID
1101	0	S_1	C_1	M	street_1	173	63	34.0	A+
1102	1	S_1	C_1	F	street_2	192	73	32.5	B+
1103	2	S_1	C_1	M	street_2	186	82	87.2	B+
1104	3	S_1	C_1	F	street_2	167	81	80.4	B-
1105	4	S_1	C_1	F	street_4	159	64	84.8	B+
1201	5	S_1	C_2	M	street_5	188	68	97.0	A-
1202	6	S_1	C_2	F	street_4	176	94	63.5	B-
1203	7	S_1	C_2	M	street_6	160	53	58.8	A+
1204	8	S_1	C_2	F	street_5	162	63	33.8	B
1205	9	S_1	C_2	F	street_6	167	63	68.4	B-
1301	10	S_1	C_3	M	street_4	161	68	31.5	B+

一、单级索引

最常用的三类：iloc - 位置索引；loc - 标签索引；[]

loc（RMK：loc中使用的切片全部包含右端点）

单行索引：

df.loc[1103]

Unnamed: 0           2
School             S_1
Class              C_1
Gender               M
Address       street_2
Height             186
Weight              82
Math              87.2
Physics             B+
Name: 1103, dtype: object

多行索引：

df.loc[[1103,1104]]

	Unnamed: 0	School	Class	Gender	Address	Height	Weight	Math	Physics
ID
1103	2	S_1	C_1	M	street_2	186	82	87.2	B+
1104	3	S_1	C_1	F	street_2	167	81	80.4	B-

df.loc[2402:].head(5)#1304往后的所有

	Unnamed: 0	School	Class	Gender	Address	Height	Weight	Math	Physics
ID
2402	31	S_2	C_4	M	street_7	166	82	48.7	B
2403	32	S_2	C_4	F	street_6	158	60	59.7	B+
2404	33	S_2	C_4	F	street_2	160	84	67.7	B
2405	34	S_2	C_4	F	street_6	193	54	47.6	B

df.loc[2402:2304:-1].head(5) #从2402开始从后往前取；loc取到端点

	Unnamed: 0	School	Class	Gender	Address	Height	Weight	Math	Physics
ID
2402	31	S_2	C_4	M	street_7	166	82	48.7	B
2401	30	S_2	C_4	F	street_2	192	62	45.3	A
2305	29	S_2	C_3	M	street_4	187	73	48.9	B
2304	28	S_2	C_3	F	street_6	164	81	95.5	A-

注：所有在loc中使用的切片全部包含右断电。
作为pandas的使用者，不会关注最后一个标签再往后一位。若为左闭右开，则需要先知道再后面一列的名字，不便于操作。

单列索引：

df.loc[:,'Height'].head()

ID
1101    173
1102    192
1103    186
1104    167
1105    159
Name: Height, dtype: int64

多列索引：

df.loc[1201:2405,['Math','Physics']].head(5)

	Math	Physics
ID
1201	97.0	A-
1202	63.5	B-
1203	58.8	A+
1204	33.8	B
1205	68.4	B-

df.loc[:,'Gender':'Weight'].head()

	Gender	Address	Height	Weight
ID
1101	M	street_1	173	63
1102	F	street_2	192	73
1103	M	street_2	186	82
1104	F	street_2	167	81
1105	F	street_4	159	64

联合索引：

df.loc[1101:2405:4,'Address':'Math'].head()

	Address	Height	Weight	Math
ID
1101	street_1	173	63	34.0
1105	street_4	159	64	84.8
1204	street_5	162	63	33.8
1303	street_7	188	82	49.7
2102	street_6	161	61	50.6

函数列索引：

lambda：匿名函数

g = lambda x: x+1

def g(x): return x+1

两者等价 --> lambda简化了函数定义的书写形式

df.loc[lambda x:x['Height'] >170 ].head()

	Unnamed: 0	School	Class	Gender	Address	Height	Weight	Math	Physics
ID
1101	0	S_1	C_1	M	street_1	173	63	34.0	A+
1102	1	S_1	C_1	F	street_2	192	73	32.5	B+
1103	2	S_1	C_1	M	street_2	186	82	87.2	B+
1201	5	S_1	C_2	M	street_5	188	68	97.0	A-
1202	6	S_1	C_2	F	street_4	176	94	63.5	B-

loc可传入函数，且函数的输入值是整张表，输出为标量、切片、合法列表（元素出现在索引中）、合法索引

def f(x):
    return [1101,1202]
df.loc[f].head()

	Unnamed: 0	School	Class	Gender	Address	Height	Weight	Math	Physics
ID
1101	0	S_1	C_1	M	street_1	173	63	34.0	A+
1202	6	S_1	C_2	F	street_4	176	94	63.5	B-

布尔索引：

df_1 = df['Gender'].isin(['M'])
df_1.head()

ID
1101     True
1102    False
1103     True
1104    False
1105    False
Name: Gender, dtype: bool

df.loc[df_1].head()

	Unnamed: 0	School	Class	Gender	Address	Height	Weight	Math	Physics
ID
1101	0	S_1	C_1	M	street_1	173	63	34.0	A+
1103	2	S_1	C_1	M	street_2	186	82	87.2	B+
1201	5	S_1	C_2	M	street_5	188	68	97.0	A-
1203	7	S_1	C_2	M	street_6	160	53	58.8	A+
1301	10	S_1	C_3	M	street_4	161	68	31.5	B+

df_2 = [True if i[-1]=='4' or i[-1]=='7' else False for i in df['Address'].values]
#df_2为list
df_2

[False,
 False,
 False,
 False,
 True,
  ...]

df.loc[df_2].head()

	Unnamed: 0	School	Class	Gender	Address	Height	Weight	Math	Physics
ID
1105	4	S_1	C_1	F	street_4	159	64	84.8	B+
1202	6	S_1	C_2	F	street_4	176	94	63.5	B-
1301	10	S_1	C_3	M	street_4	161	68	31.5	B+
1303	12	S_1	C_3	M	street_7	188	82	49.7	B
2101	15	S_2	C_1	M	street_7	174	84	83.3	C

只有布尔列表和索引子集构成的列表可传入loc

iloc方法（切片右端点不包含）

单行索引：

df.iloc[-1]

Unnamed: 0          34
School             S_2
Class              C_4
Gender               F
Address       street_6
Height             193
Weight              54
Math              47.6
Physics              B
Name: 2405, dtype: object

多行索引：

df.iloc[0:10:2]

	Unnamed: 0	School	Class	Gender	Address	Height	Weight	Math	Physics
ID
1101	0	S_1	C_1	M	street_1	173	63	34.0	A+
1103	2	S_1	C_1	M	street_2	186	82	87.2	B+
1105	4	S_1	C_1	F	street_4	159	64	84.8	B+
1202	6	S_1	C_2	F	street_4	176	94	63.5	B-
1204	8	S_1	C_2	F	street_5	162	63	33.8	B

单列索引：

df.iloc[:,-1].head()

ID
1101    A+
1102    B+
1103    B+
1104    B-
1105    B+
Name: Physics, dtype: object

多列索引：

df.iloc[:,-1::-2].head()

	Physics	Weight	Address	Class	Unnamed: 0
ID
1101	A+	63	street_1	C_1	0
1102	B+	73	street_2	C_1	1
1103	B+	82	street_2	C_1	2
1104	B-	81	street_2	C_1	3
1105	B+	64	street_4	C_1	4

混合索引：

df.iloc[3::4,-1::-3].head()

	Physics	Height	Class
ID
1104	B-	167	C_1
1203	A+	160	C_2
1302	A-	175	C_3
2101	C	174	C_1
2105	A	170	C_1

函数式索引：

df.iloc[lambda x:[-3],-1::-2].head()

	Physics	Weight	Address	Class	Unnamed: 0
ID
2403	B+	60	street_6	C_4	32

iloc中接受的参数智能为整数或整数列表或布尔列表，不能使用布尔Series，若要用则需要将values拿出来

df_3 = (df['Address']=='street_2').values
df_3

array([False,  True,  True,  True, False, False, False, False, False,
       False, False, False, False,  True, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False,  True, False, False,  True, False])

df.iloc[df_3].head()

	Unnamed: 0	School	Class	Gender	Address	Height	Weight	Math	Physics
ID
1102	1	S_1	C_1	F	street_2	192	73	32.5	B+
1103	2	S_1	C_1	M	street_2	186	82	87.2	B+
1104	3	S_1	C_1	F	street_2	167	81	80.4	B-
1304	13	S_1	C_3	M	street_2	195	70	85.2	A
2401	30	S_2	C_4	F	street_2	192	62	45.3	A

[]操作符

Series的[]操作

单元素索引：

#df['*']为一个Series，作为data就传入了index，若后边又传入一个index，根据自动对齐规则（以后边指定的index为准），就变成了NaN
#df['*'].tolist()或者df['*'].values；若只有df['*']无法确定是Math的索引还是值
s = pd.Series(df['Math'].values,index = df['Address'])
s['street_2']

street_2    32.5
street_2    87.2
street_2    80.4
street_2    85.2
street_2    45.3
street_2    67.7
dtype: float64

m = pd.Series(df['Math'],index=df.index)
m[2105]

34.2

m[0:4]

ID
1101    34.0
1102    32.5
1103    87.2
1104    80.4
Name: Math, dtype: float64

函数式索引：

#lambda x: x.index[16::-6]为绝对位置切片
#lambda x: 16::-6 为元素切片
m[lambda x: x.index[16::-6]]

ID
2102    50.6
1301    31.5
1105    84.8
Name: Math, dtype: float64

布尔索引：

m>80

ID
1101    False
1102    False
1103     True
1104     True
1105     True

…
Name: Math, dtype: bool

m[m>80]

ID
1103    87.2
1104    80.4
1105    84.8
1201    97.0
1302    87.7
1304    85.2
2101    83.3
2205    85.4
2304    95.5
Name: Math, dtype: float64

注：在Series中[]的浮点切片不是位置比较，而是值比较，故尽量不要在行索引为浮点时使用[]操作符。

s_int = pd.Series([1,2,3,4],index = [1,3,5,6])
s_float = pd.Series([1,2,3,4],index=[1.,3.,5.,6.])
s_int

1    1
3    2
5    3
6    4
dtype: int64

s_float[2:]#2作为元素

3.0    2
5.0    3
6.0    4
dtype: int64

s_int[2:]#2作为位置

5    3
6    4
dtype: int64

PenguinAsHeathen

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
Pandas基础2.1｜Python学习笔记

import numpy as npimport pandas as pddf = pd.read_csv('./data/table.csv',index_col='ID')df Unnamed: 0 School Class Gender Address Height ...
复制链接

扫一扫