数据准备:
从泰坦尼克号数据集内拿10行数据用作演示
df = pd.read_csv(r'C:\show\train.csv')
print(df)
"""
age workclass fnlwgt ... hours-per-week native-country income
0 39 State-gov 77516 ... 40 United-States <=50K
1 50 Self-emp-not-inc 83311 ... 13 United-States <=50K
2 38 Private 215646 ... 40 United-States <=50K
3 53 Private 234721 ... 40 United-States <=50K
4 28 Private 338409 ... 40 Cuba <=50K
5 37 Private 284582 ... 40 United-States <=50K
6 49 Private 160187 ... 16 Jamaica <=50K
7 52 Self-emp-not-inc 209642 ... 45 United-States >50K
8 31 Private 45781 ... 50 United-States >50K
9 42 Private 159449 ... 40 United-States >50K
"""
1.单列查询,获取存在于列表的所有行
search_df = df[df['Search_column].isin(Search_value_list)]
# 筛选age列等于39、28、49的所有行
search_age = [39, 28, 49]
age_df = df[df['age'].isin(search_age)]
print(age_df)
"""
age workclass fnlwgt ... hours-per-week native-country income
0 39 State-gov 77516 ... 40 United-States <=50K
4 28 Private 338409 ... 40 Cuba <=50K
6 49 Private 160187 ... 16 Jamaica <=50K
"""
2.多列查询,获取同时满足n个条件的行
search_df = df[(df['Col_1'] == val_1) & (df['Col_2'] == val_2) & ...]
new_df = df[(df['hours-per-week'] == 40) &
(df['age'] == 28) &
(df['native-country'] == 'Cuba')]
print(new_df)
"""
age workclass fnlwgt ... hours-per-week native-country income
4 28 Private 338409 ... 40 Cuba <=50K
"""
3.单列查询,模糊匹配
首先使用方法df['column'].str.extract 配合正则表达式(RE) 拿到相应的索引,然后根据索引拿到想要的DataFrame
# 例如,查找workclass列内包含了'-'的所有行
needed_idx = df['workclass'].str.extract('(.*-.*)').dropna().index
new_df = df.loc[needed_idx]
print(new_df)
"""
age workclass fnlwgt ... hours-per-week native-country income
0 39 State-gov 77516 ... 40 United-States <=50K
1 50 Self-emp-not-inc 83311 ... 13 United-States <=50K
7 52 Self-emp-not-inc 209642 ... 45 United-States >50K
"""
4.逆查询
常配合方法1使用new_df = df[~df['column'].isin(no_need_value_list)]
,当我们知道
a
a
a 与
b
b
b 的并集为全集,且手上只有
a
a
a 的信息(或者
a
a
a更容易获取)时,该方法能方便地拿到符合
b
b
b 条件的所有行
no_need_value = [' Cuba', ' Jamaica']
new_df = df[~df['native-country'].isin(no_need_value)]
print(new_df)
"""
age workclass fnlwgt ... hours-per-week native-country income
0 39 State-gov 77516 ... 40 United-States <=50K
1 50 Self-emp-not-inc 83311 ... 13 United-States <=50K
2 38 Private 215646 ... 40 United-States <=50K
3 53 Private 234721 ... 40 United-States <=50K
5 37 Private 284582 ... 40 United-States <=50K
7 52 Self-emp-not-inc 209642 ... 45 United-States >50K
8 31 Private 45781 ... 50 United-States >50K
9 42 Private 159449 ... 40 United-States >50K
"""