『Python』Excel文件的读取以及DataFrame的相关操作 (4)—— 常用查询语句

数据准备:
从泰坦尼克号数据集内拿10行数据用作演示

df = pd.read_csv(r'C:\show\train.csv')
print(df)
    """
            age          workclass  fnlwgt  ... hours-per-week  native-country  income
    0    39          State-gov   77516  ...             40   United-States   <=50K
    1    50   Self-emp-not-inc   83311  ...             13   United-States   <=50K
    2    38            Private  215646  ...             40   United-States   <=50K
    3    53            Private  234721  ...             40   United-States   <=50K
    4    28            Private  338409  ...             40            Cuba   <=50K
    5    37            Private  284582  ...             40   United-States   <=50K
    6    49            Private  160187  ...             16         Jamaica   <=50K
    7    52   Self-emp-not-inc  209642  ...             45   United-States    >50K
    8    31            Private   45781  ...             50   United-States    >50K
    9    42            Private  159449  ...             40   United-States    >50K
    """
1.单列查询,获取存在于列表的所有行

search_df = df[df['Search_column].isin(Search_value_list)]

# 筛选age列等于39、28、49的所有行
search_age = [39, 28, 49]
age_df = df[df['age'].isin(search_age)]
print(age_df)
"""
           age   workclass  fnlwgt  ... hours-per-week  native-country  income
    0   39   State-gov   77516  ...             40   United-States   <=50K
    4   28     Private  338409  ...             40            Cuba   <=50K
    6   49     Private  160187  ...             16         Jamaica   <=50K
"""
2.多列查询,获取同时满足n个条件的行

search_df = df[(df['Col_1'] == val_1) & (df['Col_2'] == val_2) & ...]

new_df = df[(df['hours-per-week'] == 40) & 
            (df['age'] == 28) & 
            (df['native-country'] == 'Cuba')]
print(new_df)
"""
           age workclass  fnlwgt  ... hours-per-week  native-country  income
    4   28   Private  338409  ...             40            Cuba   <=50K
"""
3.单列查询,模糊匹配

首先使用方法df['column'].str.extract 配合正则表达式(RE) 拿到相应的索引,然后根据索引拿到想要的DataFrame

# 例如,查找workclass列内包含了'-'的所有行
needed_idx = df['workclass'].str.extract('(.*-.*)').dropna().index
new_df = df.loc[needed_idx]
print(new_df)
"""
   age          workclass  fnlwgt  ... hours-per-week  native-country  income
0   39          State-gov   77516  ...             40   United-States   <=50K
1   50   Self-emp-not-inc   83311  ...             13   United-States   <=50K
7   52   Self-emp-not-inc  209642  ...             45   United-States    >50K
"""
4.逆查询

常配合方法1使用new_df = df[~df['column'].isin(no_need_value_list)],当我们知道 a a a b b b 的并集为全集,且手上只有 a a a 的信息(或者 a a a更容易获取)时,该方法能方便地拿到符合 b b b 条件的所有行

no_need_value = [' Cuba', ' Jamaica']
new_df = df[~df['native-country'].isin(no_need_value)]
print(new_df)
"""
       age          workclass  fnlwgt  ... hours-per-week  native-country  income
0   39          State-gov   77516  ...             40   United-States   <=50K
1   50   Self-emp-not-inc   83311  ...             13   United-States   <=50K
2   38            Private  215646  ...             40   United-States   <=50K
3   53            Private  234721  ...             40   United-States   <=50K
5   37            Private  284582  ...             40   United-States   <=50K
7   52   Self-emp-not-inc  209642  ...             45   United-States    >50K
8   31            Private   45781  ...             50   United-States    >50K
9   42            Private  159449  ...             40   United-States    >50K
"""
  • 0
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值