[Pandas] 相同分组条件下的特定列除第一行以外的值为空

c4de267483f643d998f9debc6516b2a4.jpeg
 美图欣赏2022/07/16

在最近的需求开发中,针对原有逻辑进行开发,导出的Excel数据如下所示:

92c731f9cd484bfa983e424d365692c5.png

提示: 以上数据仅为Demo数据,其中site和material分组后,如果两者数值分为同一组别,字段'Balance_from_Q2'数值是相同的,不存在数值不相等的情况

问题: 客户对字段'Balance_from_Q2'数据显示提出要求,根据字段'site'和'material'分组,只对分组后第一行显示数据,其余的行设置为空(NaN),Excel呈现需求结果图如下所示:

e2e93cc8b15141a9a6f978b5518d7d1d.png

数据源 

import pandas as pd

df = pd.DataFrame({'site':['FOC', 'FOC', 'FOC', 'FOC', 'FOC','FOC','FOC','FOC','FOC'],
                   'material':['01-0319-00','01-0319-00','03-0060-00','09-0334-00','09-0334-00','09-0334-00','09-0334-00','09-0334-00','09-0334-00'],
                   'Balance_from_Q2':[437,437,0,123,123,123,123,123,123]})

df 

de7b772dcde4449eae74b3eaf48a4694.png

方法1

import pandas as pd
import numpy as np

df = pd.DataFrame({'site':['FOC', 'FOC', 'FOC', 'FOC', 'FOC','FOC','FOC','FOC','FOC'],
                   'material':['01-0319-00','01-0319-00','03-0060-00','09-0334-00','09-0334-00','09-0334-00','09-0334-00','09-0334-00','09-0334-00'],
                   'Balance_from_Q2':[437,437,0,123,123,123,123,123,123]})
# 方法1
data = df.groupby(['site','material']).head(1)
df['Balance_from_Q2'] = np.nan
df['Balance_from_Q2'] = data.loc[data.index,'Balance_from_Q2']

思路

1.根据字段'site'和'material'进行分组,获取相同分组数值的第一行记录

data = df.groupby(['site','material']).head(1)

4a6382a93a4a4cde8a72cd261035b49a.png

2.将df表中字段'Balance_from_Q2'设置为空

import numpy as np
df['Balance_from_Q2'] = np.nan

b0711149a97645bdaacb2489eaab6fd4.png

3.利用按轴标签.loc将data表中记录数据中的'Balance_from_Q2'数值重新赋值到df表中

方法2 

import pandas as pd
import numpy as np

df = pd.DataFrame({'site':['FOC', 'FOC', 'FOC', 'FOC', 'FOC','FOC','FOC','FOC','FOC'],
                   'material':['01-0319-00','01-0319-00','03-0060-00','09-0334-00','09-0334-00','09-0334-00','09-0334-00','09-0334-00','09-0334-00'],
                   'Balance_from_Q2':[437,437,0,123,123,123,123,123,123]})

# 方法2
# 新增ranking字段
df['ranking'] = df.groupby(['site','material'])['Balance_from_Q2'].rank(ascending=1,method='first').astype(int)

def fn(row):
    if row['ranking'] == 1:
        return row['Balance_from_Q2']
    else:
        return np.nan
    
df['Balance_from_Q2'] = df.apply(lambda row:fn(row), axis=1)
# 计算结束,删除'ranking'字段
df = df.drop(columns = ['ranking'])

思路

1.根据字段'site'和'material'进行分组进行排序,生成'ranking'字段(类似于Pandas实现SQL窗口函数中的ROW_NUMBER()功能)

df['ranking'] = df.groupby(['site','material'])['Balance_from_Q2'].rank(ascending=1,method='first').astype(int)

9ec8437aa6fa491c95a2ce60285f1cda.png

2.只保留排序为1的'Balance_from_Q2'数值,其余用空值替代

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值