4.5 Pandas中的Dataframe数据选取（一）（Python）

ibun.song

已于 2023-02-10 14:54:02 修改

阅读量1.8k

点赞数 3

分类专栏： Python 文章标签： python pandas 数据结构

于 2023-02-08 16:57:13 首次发布

本文链接：https://blog.csdn.net/qq_40805441/article/details/128933947

版权

Python 专栏收录该内容

14 篇文章 2 订阅

订阅专栏

Pandas中的Dataframe数据选取（一）

data = {
    'name': ['NAME0', 'NAME1', 'NAME2', 'NAME3', 'NAME4', 'NAME5', 'NAME6', 'NAME7', 'NAME8', 'NAME9'],
    'age': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
    'weight': ["weight0", 101, 102, np.nan, np.nan, 105, np.nan, 107, 108, 109],
    'is_single_dog': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']
}

2. 定义下标

indexs = ['index0', 'index1', 'index2', 'index3', 'index4', 'index5', 'index6', 'index7', 'index8', 'index9']

3. df创建

df = pd.DataFrame(data, index=indexs)

完整代码：

# -*- coding: utf-8 -*-
import numpy as np
import pandas as pd

data = {
    'name': ['NAME0', 'NAME1', 'NAME2', 'NAME3', 'NAME4', 'NAME5', 'NAME6', 'NAME7', 'NAME8', 'NAME9'],

    'age': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],

    'weight': ["weight0", 101, 102, np.nan, np.nan, 105, np.nan, 107, 108, 109],

    'is_single_dog': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']
}

indexs = ['index0', 'index1', 'index2', 'index3', 'index4', 'index5', 'index6', 'index7', 'index8', 'index9']

df = pd.DataFrame(data, index=indexs)

print(df)

控制台输出结果：

         name  age   weight is_single_dog
index0  NAME0    0  weight0           yes
index1  NAME1    1      101           yes
index2  NAME2    2      102            no
index3  NAME3    3      NaN           yes
index4  NAME4    4      NaN            no
index5  NAME5    5      105            no
index6  NAME6    6      NaN            no
index7  NAME7    7      107           yes
index8  NAME8    8      108            no
index9  NAME9    9      109            no

二、Dataframe中的数据行选取

在Dataframe中选取数据大抵包括3中情况：

    1）行（列）选取（单维度选取）：df[]。这种情况一次只能选取行或者列，即一次选取中，只能为行或者列设置筛选条件（只能为一个维度设置筛选条件）。
    2）区域选取（多维选取）：df.loc[]，df.iloc[]，df.ix[]。这种方式可以同时为多个维度设置筛选条件。
    3）单元格选取（点选取）：df.at[]，df.iat[]。准确定位一个单元格。

1. 初期数据定义

接上记：

data = {
    'name': ['NAME0', 'NAME1', 'NAME2', 'NAME3', 'NAME4', 'NAME5', 'NAME6', 'NAME7', 'NAME8', 'NAME9'],

    'age': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],

    'weight': ["weight0", 101, 102, np.nan, np.nan, 105, np.nan, 107, 108, 109],

    'is_single_dog': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']
}

indexs = ['index0', 'index1', 'index2', 'index3', 'index4', 'index5', 'index6', 'index7', 'index8', 'index9']

df = pd.DataFrame(data, index=indexs)

2. 行选取（单维度选取）

①　整数索引切片：前闭后开 [ , )

# 选取第一行df数据
df_line_1 = df[0:1]

print(df_line_1)

控制台输出结果：

         name  age   weight is_single_dog
index0  NAME0    0  weight0           yes

# 选取第三行df数据
df_line_3 = df[2:3]

print(df_line_3)

控制台输出结果：

         name  age weight is_single_dog
index2  NAME2    2    102            no

# 选取第二行到第三行df数据
df_line_2_to_3 = df[1:3]

print(df_line_2_to_3)

控制台输出结果：

         name  age weight is_single_dog
index1  NAME1    1    101           yes
index2  NAME2    2    102            no

②　标签索引切片：前闭后闭 [ , ]

# 选取第一行df数据
df_line_1 = df[:'index0']

print(df_line_1)

控制台输出结果：

         name  age   weight is_single_dog
index0  NAME0    0  weight0           yes

# 选取第三行df数据
df_line_3 = df['index2':'index2']

print(df_line_3)

控制台输出结果：

         name  age weight is_single_dog
index2  NAME2    2    102            no

# 选取第二行到第三行df数据
df_line_2_to_3 = df['index1':'index2']

print(df_line_2_to_3)

控制台输出结果：

         name  age weight is_single_dog
index1  NAME1    1    101           yes
index2  NAME2    2    102            no

③　布尔数组

# 选取第一行df数据
df_line_1 = df[[True, False, False, False, False, False, False, False, False, False]]

print(df_line_1)

控制台输出结果：

         name  age   weight is_single_dog
index0  NAME0    0  weight0           yes

# 选取第三行df数据
df_line_3 = df[[False, False, True, False, False, False, False, False, False, False]]

print(df_line_3)

控制台输出结果：

         name  age weight is_single_dog
index2  NAME2    2    102            no

# 选取第二行到第三行df数据
df_line_2_to_3 = df[[False, True, True, False, False, False, False, False, False, False]]

print(df_line_2_to_3)

控制台输出结果：

         name  age weight is_single_dog
index1  NAME1    1    101           yes
index2  NAME2    2    102            no

④　单条件选取

# 条件(age > 8)选取行df数据
df_line_age_over_8 = df[df['age'] > 8]

print(df_line_age_over_8)

控制台输出结果：

         name  age weight is_single_dog
index9  NAME9    9    109            no

⑤　多条件选取

注意事项：多条件选取数据时，单个条件最好用括号括起来，防止出错

"""
选取满足以下条件的df数据行：
 1. age > 5
 2. weight 不为空
 3. is_single_dog = no

"""

df_line_condition_more = df[(df['age'] > 5) & (df['weight'] is not None) & (df['is_single_dog'] == 'no')]

print(df_line_condition_more)

控制台输出结果：

         name  age weight is_single_dog
index6  NAME6    6    NaN            no
index8  NAME8    8    108            no
index9  NAME9    9    109            no

注意事项：NaN的处理之后再说

"""
选取满足以下任意条件的df数据行：
 1. name = NAME9
 2. age > 7 且 age < 9
 3. weight = 107

"""

df_line_condition_more = df[(df['name'] == 'NAME9') | ((df['age'] > 7) & (df['age'] < 9)) | (df['weight'] == 107)]

print(df_line_condition_more)

控制台输出结果：

         name  age weight is_single_dog
index7  NAME7    7    107           yes
index8  NAME8    8    108            no
index9  NAME9    9    109            no