在满足特定条件时，使用第二个数据框的值来替换 pandas DataFrame 的列值

qq^^614136809

于 2024-07-25 17:02:25 发布

阅读量271

点赞数 4

文章标签： pandas

本文链接：https://blog.csdn.net/D0126_/article/details/140694783

版权

用户希望在 pandas DataFrame 中用另一个 DataFrame 的值替换特定列的值，如果满足某些条件。

具体来说，用户需要在 df1new 的 ‘mean’ 列中用 df2new 的值替换所有满足 ‘test’ == “ACT Composite” 且 ‘mean’ 为空（即 None）的行的值。

解决方案

为了解决这个问题，可以使用以下步骤：
1. 定义一个布尔索引 idx，其中：
  - 条件 1： df1new.test == 'ACT Composite'：检查 df1new 的 ‘test’ 列中值是否等于 “ACT Composite”。
  - 条件 2： df1new['mean'].isnull()：检查 df1new 的 ‘mean’ 列中的值是否为空（即 None）。
2. 使用索引 idx 和布尔运算符 &，过滤出满足条件 1 和条件 2 的行。
3. 使用 df1new['mean'][idx] = df2new['mean'][1]，将 df1new 的 ‘mean’ 列中满足条件的行值替换为 df2new 的 ‘mean’ 列中，满足 ‘test’ == “ACT Composite” 条件的第一行的值。注意，[1] 可以替换为 [df2new.test == 'ACT Composite']。
代码示例：

import pandas as pd

# 定义数据框 df1new 和 df2new
df1 = [{'test': 'SAT Math', '25th_percentile': None, '75th_percentile': None, '50th_percentile': None, 'mean': 404},
       {'test': 'SAT Verbal', '25th_percentile': None, '75th_percentile': None, '50th_percentile': None, 'mean': 355},
       {'test': 'SAT Writing', '25th_percentile': None, '75th_percentile': None, '50th_percentile': None, 'mean': 363},
       {'test': 'SAT Composite', '25th_percentile': None, '75th_percentile': None, '50th_percentile': None, 'mean': 1122},
       {'test': 'ACT Math', '25th_percentile': None, '75th_percentile': None, '50th_percentile': None, 'mean': None},
       {'test': 'ACT English', '25th_percentile': None, '75th_percentile': None, '50th_percentile': None, 'mean': None},
       {'test': 'ACT Reading', '25th_percentile': None, '75th_percentile': None, '50th_percentile': None, 'mean': None},
       {'test': 'ACT Science', '25th_percentile': None, '75th_percentile': None, '50th_percentile': None, 'mean': None},
       {'test': 'ACT Composite', '25th_percentile': None, '75th_percentile': None, '50th_percentile': None, 'mean': None}]

df2 = [{'test': 'SAT Composite', 'mean': 1981},
       {'test': 'ACT Composite', 'mean': 29.6}]

df1new = pd.DataFrame(df1, columns=['test', '25th_percentile', 'mean', '50th_percentile','75th_percentile'])
df2new = pd.DataFrame(df2)

# 定义索引 idx
idx = (df1new.test == 'ACT Composite') & df1new['mean'].isnull()

# 使用索引 idx 替换值
df1new['mean'][idx] = df2new['mean'][df2new.test == 'ACT Composite']

# 打印替换后的数据框
print(df1new)

运行结果：

   test  25th_percentile  mean  50th_percentile  75th_percentile
0  SAT Math           None  404           None           None
1  SAT Verbal           None  355           None           None
2  SAT Writing           None  363           None           None
3  SAT Composite         None  1122           None           None
4  ACT Math           None   NaN           None           None
5  ACT English           None   NaN           None           None
6  ACT Reading           None   NaN           None           None
7  ACT Science           None   NaN           None           None
8  ACT Composite         None  29.6           None           None