pandas如何合并列表_使用Pandas合并具有交替列的DF列表

最新推荐文章于 2023-08-10 12:02:05 发布

遮弧酒邪

最新推荐文章于 2023-08-10 12:02:05 发布

阅读量304

点赞数

文章标签： pandas如何合并列表

本文链接：https://blog.csdn.net/weixin_42565865/article/details/112879226

版权

I have the following codes:

import pandas as pd

rep1 = pd.DataFrame.from_items([('Probe', ['x', 'y', 'z']), ('Gene', ['foo', 'bar', 'qux']), ('RP1',[1.00,23.22,11.12]),('RP1',["A","B","C"]) ], orient='columns')

rep2 = pd.DataFrame.from_items([('Probe', ['x', 'y', 'z']), ('Gene', ['foo', 'bar', 'qux']), ('RP2',[3.33,77.22,18.12]),('RP2',["G","I","K"]) ], orient='columns')

rep3 = pd.DataFrame.from_items([('Probe', ['x', 'y', 'k']), ('Gene', ['foo', 'bar', 'kux']), ('RP3',[99.99,98.29,8.10]),('RP2',["M","P","J"]) ], orient='columns')

tmp = []

tmp.append(rep1)

tmp.append(rep2)

tmp.append(rep3)

Which produces the following list of data frames.

In [56]: tmp

Out[56]:

[ Probe Gene RP1 RP1

0 x foo 1.00 A

1 y bar 23.22 B

2 z qux 11.12 C, Probe Gene RP2 RP2

0 x foo 3.33 G

1 y bar 77.22 I

2 z qux 18.12 K, Probe Gene RP3 RP2

0 x foo 99.99 M

1 y bar 98.29 P

2 k kux 8.10 J]

Each of the data frames above has the following characteristics:

Always contain 4 columns

Second and last column has identical names

First two columns always named Probe and Gene

Content of Probe and Gene always appear consistently, ie.

'x' always goes with 'foo'.

I'm trying to merge those DFs in the list so that it produces this:

Probe Gene RP1 RP2 RP3 RP1 RP2 RP3

0 x foo 1.00 3.33 99.99 A G M

1 y bar 23.22 77.22 98.29 B I P

2 z qux 11.12 18.12 NA C K NA

3 k kux NA NA 8.10 NA NA J

I tried this code but failed:

In [67]: reduce(pd.merge,tmp)

MergeError: Left data columns not unique: Index([u'Probe', u'Gene', u'RP1', u'RP1'], dtype='object')

What's the right way to do it?

解决方案

You could dedupe the column names. Here's a kind of hacky way:

In [11]: list(rep1.columns[0:2]) + [rep1.columns[2] + "_value"] + [rep1.columns[2] + "_letter"]

Out[11]: ['Probe', 'Gene', 'RP1_value', 'RP1_letter']

In [12]: for rep in tmp:

.....: rep.columns = list(rep.columns[0:2]) + [rep.columns[2] + "_value"] + [rep.columns[2] + "_letter"]

In [13]: reduce(pd.merge,tmp)

Out[13]:

Probe Gene RP1_value RP1_letter RP2_value RP2_letter RP3_value RP3_letter

0 x foo 1.00 A 3.33 G 99.99 M

1 y bar 23.22 B 77.22 I 98.29 P

You also need to specify it as an outer merge (to get the NaN rows):

In [21]: reduce(lambda x, y: pd.merge(x, y, how='outer'),tmp)

Out[21]:

Probe Gene RP1_value RP1_letter RP2_value RP2_letter RP3_value RP3_letter

0 x foo 1.00 A 3.33 G 99.99 M

1 y bar 23.22 B 77.22 I 98.29 P

2 z qux 11.12 C 18.12 K NaN NaN

3 k kux NaN NaN NaN NaN 8.10 J

遮弧酒邪

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
pandas如何合并列表_使用Pandas合并具有交替列的DF列表

I have the following codes:import pandas as pdrep1 = pd.DataFrame.from_items([('Probe', ['x', 'y', 'z']), ('Gene', ['foo', 'bar', 'qux']), ('RP1',[1.00,23.22,11.12]),('RP1',["A","B","C"]) ], orient=...
复制链接

扫一扫