Python DataFrame取行


关注根据某列series的值区间,取行问题。
根据行的index区间位置,必须先知道index区间,比较基础
代码准备:
环境平台:Python 3.7.1 -IDLE Shell

>>>import pandas as pd
>>> df = pd.DataFrame({'Name': ['Tom', 'Jim', 'Lily'], 'Age': [20, 18, 22], 'Gender': ['Male', 'Male', 'Female']})

注:该例子数据形式来自:https://www.python100.com/html/116332.html

index区间取行

示例1:提取索引名=‘1’的那一行,返回一个行Series

>>> row = df.loc[1]#按索引名提取,当使用自动生成的索引时,索引名与索引号相同
>>> row
Name       Jim
Age         18
Gender    Male
Name: 1, dtype: object
>>> type(row)
<class 'pandas.core.series.Series'>

注:

df.iloc[:] #按索引(号)提取

示例2:

row = df.loc[0:1]
>>> row
  Name  Age Gender
0  Tom   20   Male
1  Jim   18   Male
>>> row = df.iloc[0:1]
>>> row
  Name  Age Gender
0  Tom   20   Male
>>> row = df.loc[0:0]
>>> row
  Name  Age Gender
0  Tom   20   Male
>>> type(row)
<class 'pandas.core.frame.DataFrame'>

注意:两种提取的区间有区别:按索引(号)提取的区间为:[0,1)

>>> row = df.iloc[0:0]
>>> row
Empty DataFrame
Columns: [Name, Age, Gender]
Index: []

列值区间条件取行

>>> row = df.loc[df['Age'] > 20].iloc[0]['Name']
>>> row
'Lily'
>>> 

上语句的含义是:需要从dataframe:df.loc[df[‘Age’] > 20]中提取索引为0的行Series的‘name’值

(1)列值区间基本表达方式

示例1:


>>> row = df.loc[df['Age'] > 18]
>>> row
   Name  Age  Gender
0   Tom   20    Male
2  Lily   22  Female
>>> 

注:超过区间,不会产生错误,返回:

>>> row = df.loc[df['Age'] > 23]
>>> row
Empty DataFrame
Columns: [Name, Age, Gender]
Index: []

(2)多条件组合表达方式

示例2:

>>> row = df.loc[(df['Age'] >= 18)&(df['Name'] == 'Lily')]
>>> row
   Name  Age  Gender
2  Lily   22  Female
>>> 

如果条件为False则返回的dataframe为Empty:

>>> row = df.loc[(df['Age'] >= 18)&(df['Name'] == 'tongzhi')]#'tongzhi'不存在原dataframe
>>> row
Empty DataFrame
Columns: [Name, Age, Gender]
Index: []
>>> 

当然也可以:用’|'关系操作符:

>>> row = df.loc[(df['Age'] >= 18)|(df['Name'] == 'Jim')]
>>> row
   Name  Age  Gender
0   Tom   20    Male
1   Jim   18    Male
2  Lily   22  Female
>>> 

注:还可以关系:~ 非

(3)函数条件表达方式

可以使用lambda或自定义函数(返回bool)选择符合返回条件的行,如:

>>> x='Jim'
>>> row = df.loc[lambda x:x['Name'] == 'Jim']
>>> row
  Name  Age Gender
1  Jim   18   Male
>>> 

datafame接受的几个过滤函数

(1)isin函数:

df[df[“column_name”].isin(li)] (# li = [20, 25, 27] 或 li = np.arange(20, 30))
根据从isin函数传入的列表(li),筛选出与列表中包含的数值或字符串相同的数据记录, 用法有点类似sql中的"in"

(2) query函数:

df.query(“(column_name1 == ‘str1’) & (column_name2 == ‘str2’)”)
根据query中引入的不同字段(str1,str2等)和条件,筛选出同时能满足这些要求的数据记录

(3) contains函数:

df[df[“column_name”].str.contains(“str”)]
筛选出所有含有(str)的数据记录, 用法类似于sql中的"contains"

以上参考了:链接:https://blog.csdn.net/weixin_45914452/article/details/120585861

错误条件格式:

示例1:

>>> row = df.loc[(18<=df['Age'] <= 22)]
Traceback (most recent call last):
  File "<pyshell#56>", line 1, in <module>
    row = df.loc[(18<=df['Age'] <= 22)]
  File "D:\Program Files\Python371\lib\site-packages\pandas\core\generic.py", line 1538, in __nonzero__
    f"The truth value of a {type(self).__name__} is ambiguous. "
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> 

示例2:

>>> row = df.loc[df['Age'] >= 18 & df['Age'] <= 22]
Traceback (most recent call last):
  File "<pyshell#38>", line 1, in <module>
    row = df.loc[df['Age'] >= 18 & df['Age'] <= 22]
  File "D:\Program Files\Python371\lib\site-packages\pandas\core\generic.py", line 1538, in __nonzero__
    f"The truth value of a {type(self).__name__} is ambiguous. "
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
  • 10
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值