[Pandas] 多行合并成一行

7b3984bdb037493097b9439e063bdbac.png
 美图欣赏2022/09/16

在最近的需求开发中,有如下需求需要进行修改,数据源demo如下所示

251907c927734d1b941fc2d4e386ee0e.png

根据字段'material'进行分组,对字段'site'进行合并,内容之间用逗号(,)分隔,再进行去重处理;对字段'LT'取最大值,最终呈现结果如下所示

1ce115b0909c44eab34246e1b8c65714.png

具体实现代码如下所示

import pandas as pd

df = pd.DataFrame([['FJZ','A123',123],
                   ['FOC','A123',456],
                   ['FJZ','B456',112],
                   ['FJZ','B456',245],
                   ['FJZ','B456',110],
                   ['FOC','C789',202],
                   ['FOC','C789',205]
                  ],columns=['site','material','LT'])

# 筛选字段'material'并进行去重处理
merge_data = df[['material']]
merge_data = merge_data.drop_duplicates(subset = ['material'])

# 对'site'和'LT'字段进行处理
df = df.groupby(['material']).agg({'site':[','.join],'LT':max})
# 更换字段栏位名称
new_column = ['site','LT']
df.columns = new_column

# 对字段'site'中的值进行去重处理
def data_deduplication(row):
    data_list = row['site'].split(',')
    res = ','.join(set(data_list))
    return res

df['site'] = df.apply(lambda row:data_deduplication(row), axis=1)

merge_data = pd.merge(merge_data, df, how='left', on=['material'])
# 调整字段顺序
order = ['site','material','LT']
merge_data = merge_data[order]

df(未进行合并处理) 

b9e6599dadad49cea82560fbdd394d5a.png

merge_data 

b96efd480abf493b9cd35eb6ddd8152f.png

将上述核心关键代码进行拆分讲解

import pandas as pd

df = pd.DataFrame([['FJZ','A123',123],
                   ['FOC','A123',456],
                   ['FJZ','B456',112],
                   ['FJZ','B456',245],
                   ['FJZ','B456',110],
                   ['FOC','C789',202],
                   ['FOC','C789',205]
                  ],columns=['site','material','LT'])

b9e6599dadad49cea82560fbdd394d5a.png

# 对'site'和'LT'字段进行处理
df = df.groupby(['material']).agg({'site':[','.join],'LT':max})

该段代码可将多行数据合并成一行数据 

0ae3d2a267c847a090fbcb9007977e58.png

从上图可以看出df中的字段名发生了变化,我们需要对此进行更换字段名操作,便于后续的理解以及数据处理

# 更换字段栏位名称
new_column = ['site','LT']
df.columns = new_column

1c4c29fc4f7d4cfa89dbba58fc1d73da.png

另一种解决方案(更优)

import pandas as pd

df = pd.DataFrame([['FJZ','A123',123],
                   ['FOC','A123',456],
                   ['FJZ','B456',112],
                   ['FJZ','B456',245],
                   ['FJZ','B456',110],
                   ['FOC','C789',202],
                   ['FOC','C789',205]
                  ],columns=['site','material','LT'])

df_copy = df.copy()
# 筛选字段'material'并进行去重处理
merge_data = df_copy[['material']]
merge_data = merge_data.drop_duplicates(subset = ['material'])

# 定义拼接函数,并对字段进行去重
def concat_func(row):
    return pd.Series({
        'site':','.join(map(str,row['site'].unique()))
    })

# 对'site'和'LT'字段进行处理
df_copy = df_copy.groupby(df_copy['material']).apply(lambda row:concat_func(row))
merge_data = pd.merge(merge_data, df_copy, how='left', on=['material'])

df_copy = df.copy()
df_copy = df_copy.groupby(df['material']).agg({'LT':max})
merge_data = pd.merge(merge_data, df_copy, how='left', on=['material'])
# 调整字段顺序
order = ['site','material','LT']
merge_data = merge_data[order]

扩展补充多行合并成一行操作案例

案例1

问题:根据字段'id'进行分组,对字段'project'进行合并,内容之间用逗号(,)分隔

import pandas as pd

data = pd.DataFrame({'id':[1,1,2,2],'project':['A','B','C','D']})
 
#合并数据
merge_data = data.groupby(['id'])['project'].apply(list).to_frame()
merge_data['project'] = merge_data['project'].apply(lambda x:str(x).replace('[','').replace(']',''))

合并前

50c44e44e8c044fa992d378ce722a011.png

合并后

8e7b12e42d7b41298577de89bbbb91af.png

案例2

问题:把多行数据按“姓名”合并,并保留所有信息 

数据源

1ac1eed7fac34d8e94b9a0071970a9fd.png

经过合并后的数据

0fc9066148eb42649765cfe3d26aa444.png

具体代码如下所示 

import pandas as pd

df = pd.DataFrame([['Odin','电影','男'],
                   ['Odin','旅游','男'],
                   ['Odin','音乐','男'],
                   ['Lee','篮球','女'],
                   ['Lee','插花','女'],
                   ['Lee','瑜伽','女'],
                   ['Andy','足球','男'],
                   ['Andy','乒乓球','男'],
                   ['Summer','阅读','女'],
                   ['Summer','音乐','女'],
                  ],columns=['name','hobby','gender'])

# 定义拼接函数,并对字段进行去重
def concat_func(row):
    return pd.Series({
        'hobby':','.join(row['hobby'].unique()),
        'gender':','.join(row['gender'].unique())
    })

result = df.groupby(df['name']).apply(lambda row:concat_func(row)).reset_index()

案例3

问题: 把多行数据按“material”合并,并保留所有信息,其中'site'字段进行去重处理,'usages'字段不进行去重处理

数据源

dc91d396cc784ebea5f97aa51b0e49e9.png

经过合并后的数据

2ae4a9b807244fc1a50f4e78921150e3.png

import pandas as pd
 
df = pd.DataFrame([['FJZ','A123',1],
                   ['FOC','A123',1],
                   ['FJZ','B456',1],
                   ['FJZ','B456',2],
                   ['FJZ','B456',2],
                   ['FOC','C789',3],
                   ['FOC','C789',5]
                  ],columns=['site','material','usages'])
 
order = ['site','material','usages']
data_new = df[order]

# 定义拼接函数,并对字段进行去重
def concat_func(row):
    return pd.Series({
        'site':','.join(map(str,row['site'].unique())),
        'usages':','.join(map(str,row['usages']))
    })

data_new = data_new.groupby(data_new['material']).apply(lambda row:concat_func(row)).reset_index()

# 调整字段栏位顺序
order = ['site','material','usages']
data_new = data_new[order]
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值