在最近的需求开发中,针对原有逻辑进行开发,导出的Excel数据如下所示:
提示: 以上数据仅为Demo数据,其中site和material分组后,如果两者数值分为同一组别,字段'Balance_from_Q2'数值是相同的,不存在数值不相等的情况
问题: 客户对字段'Balance_from_Q2'数据显示提出要求,根据字段'site'和'material'分组,只对分组后第一行显示数据,其余的行设置为空(NaN),Excel呈现需求结果图如下所示:
数据源
import pandas as pd
df = pd.DataFrame({'site':['FOC', 'FOC', 'FOC', 'FOC', 'FOC','FOC','FOC','FOC','FOC'],
'material':['01-0319-00','01-0319-00','03-0060-00','09-0334-00','09-0334-00','09-0334-00','09-0334-00','09-0334-00','09-0334-00'],
'Balance_from_Q2':[437,437,0,123,123,123,123,123,123]})
df
方法1
import pandas as pd
import numpy as np
df = pd.DataFrame({'site':['FOC', 'FOC', 'FOC', 'FOC', 'FOC','FOC','FOC','FOC','FOC'],
'material':['01-0319-00','01-0319-00','03-0060-00','09-0334-00','09-0334-00','09-0334-00','09-0334-00','09-0334-00','09-0334-00'],
'Balance_from_Q2':[437,437,0,123,123,123,123,123,123]})
# 方法1
data = df.groupby(['site','material']).head(1)
df['Balance_from_Q2'] = np.nan
df['Balance_from_Q2'] = data.loc[data.index,'Balance_from_Q2']
思路
1.根据字段'site'和'material'进行分组,获取相同分组数值的第一行记录
data = df.groupby(['site','material']).head(1)
2.将df表中字段'Balance_from_Q2'设置为空
import numpy as np df['Balance_from_Q2'] = np.nan
3.利用按轴标签.loc将data表中记录数据中的'Balance_from_Q2'数值重新赋值到df表中
方法2
import pandas as pd
import numpy as np
df = pd.DataFrame({'site':['FOC', 'FOC', 'FOC', 'FOC', 'FOC','FOC','FOC','FOC','FOC'],
'material':['01-0319-00','01-0319-00','03-0060-00','09-0334-00','09-0334-00','09-0334-00','09-0334-00','09-0334-00','09-0334-00'],
'Balance_from_Q2':[437,437,0,123,123,123,123,123,123]})
# 方法2
# 新增ranking字段
df['ranking'] = df.groupby(['site','material'])['Balance_from_Q2'].rank(ascending=1,method='first').astype(int)
def fn(row):
if row['ranking'] == 1:
return row['Balance_from_Q2']
else:
return np.nan
df['Balance_from_Q2'] = df.apply(lambda row:fn(row), axis=1)
# 计算结束,删除'ranking'字段
df = df.drop(columns = ['ranking'])
思路
1.根据字段'site'和'material'进行分组进行排序,生成'ranking'字段(类似于Pandas实现SQL窗口函数中的ROW_NUMBER()功能)
df['ranking'] = df.groupby(['site','material'])['Balance_from_Q2'].rank(ascending=1,method='first').astype(int)
2.只保留排序为1的'Balance_from_Q2'数值,其余用空值替代