python提取第一行数据_根据条件获取Python Pandas中的第一行数据框

1586010002-jmsa.png

Let's say that I have a dataframe like this one

import pandas as pd

df = pd.DataFrame([[1, 2, 1], [1, 3, 2], [4, 6, 3], [4, 3, 4], [5, 4, 5]], columns=['A', 'B', 'C'])

>> df

A B C

0 1 2 1

1 1 3 2

2 4 6 3

3 4 3 4

4 5 4 5

The original table is more complicated with more columns and rows.

I want to get the first row that fulfil some criteria. Examples:

Get first row where A > 3 (returns row 2)

Get first row where A > 4 AND B > 3 (returns row 4)

Get first row where A > 3 AND (B > 3 OR C > 2) (returns row 2)

But, if there isn't any row that fulfil the specific criteria, then I want to get the first one after I just sort it descending by A (or other cases by B, C etc)

Get first row where A > 6 (returns row 4 by ordering it by A desc and get the first one)

I was able to do it by iterating on the dataframe (I know that craps :P). So, I prefer a more pythonic way to solve it.

解决方案

This tutorial is a very good one for pandas slicing. Make sure you check it out. Onto some snippets... To slice a dataframe with a condition, you use this format:

>>> df[condition]

This will return a slice of your dataframe which you can index using iloc. Here are your examples:

Get first row where A > 3 (returns row 2)

>>> df[df.A > 3].iloc[0]

A 4

B 6

C 3

Name: 2, dtype: int64

If what you actually want is the row number, rather than using iloc, it would be df[df.A > 3].index[0].

Get first row where A > 4 AND B > 3:

>>> df[(df.A > 4) & (df.B > 3)].iloc[0]

A 5

B 4

C 5

Name: 4, dtype: int64

Get first row where A > 3 AND (B > 3 OR C > 2) (returns row 2)

>>> df[(df.A > 3) & ((df.B > 3) | (df.C > 2))].iloc[0]

A 4

B 6

C 3

Name: 2, dtype: int64

Now, with your last case we can write a function that handles the default case of returning the descending-sorted frame:

>>> def series_or_default(X, condition, default_col, ascending=False):

... sliced = X[condition]

... if sliced.shape[0] == 0:

... return X.sort_values(default_col, ascending=ascending).iloc[0]

... return sliced.iloc[0]

>>>

>>> series_or_default(df, df.A > 6, 'A')

A 5

B 4

C 5

Name: 4, dtype: int64

As expected, it returns row 4.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值