python pandas. join and merge dataframe

 

import modules

import pandas as pd
from IPython.display import display
from IPython.display import Image

Create a dataframe

raw_data = {
        'subject_id': ['1', '2', '3', '4', '5'],
        'first_name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 
        'last_name': ['Anderson', 'Ackerman', 'Ali', 'Aoni', 'Atiches']}
df_a = pd.DataFrame(raw_data, columns = ['subject_id', 'first_name', 'last_name'])
df_a
 subject_idfirst_namelast_name
01AlexAnderson
12AmyAckerman
23AllenAli
34AliceAoni
45AyoungAtiches

Create a second dataframe

raw_data = {
        'subject_id': ['4', '5', '6', '7', '8'],
        'first_name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 
        'last_name': ['Bonder', 'Black', 'Balwner', 'Brice', 'Btisan']}
df_b = pd.DataFrame(raw_data, columns = ['subject_id', 'first_name', 'last_name'])
df_b
 subject_idfirst_namelast_name
04BillyBonder
15BrianBlack
26BranBalwner
37BryceBrice
48BettyBtisan

Create a third dataframe

raw_data = {
        'subject_id': ['1', '2', '3', '4', '5', '7', '8', '9', '10', '11'],
        'test_id': [51, 15, 15, 61, 16, 14, 15, 1, 61, 16]}
df_n = pd.DataFrame(raw_data, columns = ['subject_id','test_id'])
df_n
 subject_idtest_id
0151
1215
2315
3461
4516
5714
6815
791
81061
91116

Join the two dataframes along rows

df_new = pd.concat([df_a, df_b])
df_new
 subject_idfirst_namelast_name
01AlexAnderson
12AmyAckerman
23AllenAli
34AliceAoni
45AyoungAtiches
04BillyBonder
15BrianBlack
26BranBalwner
37BryceBrice
48BettyBtisan

Join the two dataframes along columns

pd.concat([df_a, df_b], axis=1)
 subject_idfirst_namelast_namesubject_idfirst_namelast_name
01AlexAnderson4BillyBonder
12AmyAckerman5BrianBlack
23AllenAli6BranBalwner
34AliceAoni7BryceBrice
45AyoungAtiches8BettyBtisan

Merge two dataframes along the subject_id value

pd.merge(df_new, df_n, on='subject_id')
 subject_idfirst_namelast_nametest_id
01AlexAnderson51
12AmyAckerman15
23AllenAli15
34AliceAoni61
44BillyBonder61
55AyoungAtiches16
65BrianBlack16
77BryceBrice14
88BettyBtisan15

Merge two dataframes with both the left and right dataframes using the subject_id key

pd.merge(df_new, df_n, left_on='subject_id', right_on='subject_id')
 subject_idfirst_namelast_nametest_id
01AlexAnderson51
12AmyAckerman15
23AllenAli15
34AliceAoni61
44BillyBonder61
55AyoungAtiches16
65BrianBlack16
77BryceBrice14
88BettyBtisan15

Merge with outer join

“Full outer join produces the set of all records in Table A and Table B, with matching records from both sides where available. If there is no match, the missing side will contain null.” - source

pd.merge(df_a, df_b, on='subject_id', how='outer')
 subject_idfirst_name_xlast_name_xfirst_name_ylast_name_y
01AlexAndersonNaNNaN
12AmyAckermanNaNNaN
23AllenAliNaNNaN
34AliceAoniBillyBonder
45AyoungAtichesBrianBlack
56NaNNaNBranBalwner
67NaNNaNBryceBrice
78NaNNaNBettyBtisan

Merge with inner join

“Inner join produces only the set of records that match in both Table A and Table B.” - source

pd.merge(df_a, df_b, on='subject_id', how='inner')
 subject_idfirst_name_xlast_name_xfirst_name_ylast_name_y
04AliceAoniBillyBonder
15AyoungAtichesBrianBlack

Merge with right join

pd.merge(df_a, df_b, on='subject_id', how='right')
 subject_idfirst_name_xlast_name_xfirst_name_ylast_name_y
04AliceAoniBillyBonder
15AyoungAtichesBrianBlack
26NaNNaNBranBalwner
37NaNNaNBryceBrice
48NaNNaNBettyBtisan

Merge with left join

“Left outer join produces a complete set of records from Table A, with the matching records (where available) in Table B. If there is no match, the right side will contain null.” - source

pd.merge(df_a, df_b, on='subject_id', how='left')
 subject_idfirst_name_xlast_name_xfirst_name_ylast_name_y
01AlexAndersonNaNNaN
12AmyAckermanNaNNaN
23AllenAliNaNNaN
34AliceAoniBillyBonder
45AyoungAtichesBrianBlack

Merge while adding a suffix to duplicate column names

pd.merge(df_a, df_b, on='subject_id', how='left', suffixes=('_left', '_right'))
 subject_idfirst_name_leftlast_name_leftfirst_name_rightlast_name_right
01AlexAndersonNaNNaN
12AmyAckermanNaNNaN
23AllenAliNaNNaN
34AliceAoniBillyBonder
45AyoungAtichesBrianBlack

Merge based on indexes

pd.merge(df_a, df_b, right_index=True, left_index=True)
 subject_id_xfirst_name_xlast_name_xsubject_id_yfirst_name_ylast_name_y
01AlexAnderson4BillyBonder
12AmyAckerman5BrianBlack
23AllenAli6BranBalwner
34AliceAoni7BryceBrice
45AyoungAtiches8BettyBtisan

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值