pandas merge:默认是inner join

http://liao.cpython.org/pandas27/
https://mp.weixin.qq.com/s?__biz=MzU5Mjg2OTQ1MA==&mid=2247484160&idx=1&sn=c1ed435f441c2b53751fec3558e7edee&chksm=fe186225c96feb330e129a47ff979301f6dcdc042ce24fa7b23f61e21d6c13a30e25d00f469d&scene=21#wechat_redirect

merge(right, how=‘inner’, on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=(’_x’, ‘_y’), copy=True, indicator=False, validate=None)

merge函数默认拼接数据是inner join即内连接。

先构造两个df, 含有相同的1列列名:

col1 = 'class_name class_id class_lecturer'.split()
col2 = 'class_id stu_id'.split()

val1 = [['IT',100,'Wangli'],['cs',101,'WangMa'],['CAD',102,'Liping']]
val2 = [[100,20202222],[101,30205139],[104,20213414],[101,20338888]]

course = pd.DataFrame(val1,columns = col1)
choose = pd.DataFrame(val2,columns = col2)

course
Out[9]: 
  class_name  class_id class_lecturer
0         IT       100         Wangli
1         cs       101         WangMa
2        CAD       102         Liping

choose
Out[15]: 
   class_id    stu_id
0       100  20202222
1       101  30205139
2       104  20213414
3       101  20338888

默认的左右merge 结果,会选择一个列相同的column进行merge:

course.merge(choose)
Out[16]: 
  class_name  class_id class_lecturer    stu_id
0         IT       100         Wangli  20202222
1         cs       101         WangMa  30205139
2         cs       101         WangMa  20338888
choose.merge(course)
Out[17]: 
   class_id    stu_id class_name class_lecturer
0       100  20202222         IT         Wangli
1       101  30205139         cs         WangMa
2       101  20338888         cs         WangMa

merge 加入how = 'outer’后,结果是两个DataFrame均输出,未匹配上的用NaN填充。行数会相应增加的。

course.merge(choose,how = 'outer')
Out[6]: 
  class_name  class_id class_lecturer      stu_id
0         IT       100         Wangli  20202222.0
1         cs       101         WangMa  30205139.0
2         cs       101         WangMa  20338888.0
3        CAD       102         Liping         NaN
4        NaN       104            NaN  20213414.0

choose.merge(course,how = 'outer')
Out[7]: 
   class_id      stu_id class_name class_lecturer
0       100  20202222.0         IT         Wangli
1       101  30205139.0         cs         WangMa
2       101  20338888.0         cs         WangMa
3       104  20213414.0        NaN            NaN
4       102         NaN        CAD         Liping

how = ‘left’

course.merge(choose,how = 'left')
Out[10]: 
  class_name  class_id class_lecturer      stu_id
0         IT       100         Wangli  20202222.0
1         cs       101         WangMa  30205139.0
2         cs       101         WangMa  20338888.0
3        CAD       102         Liping         NaN

choose.merge(course, how = 'left')
Out[12]: 
   class_id    stu_id class_name class_lecturer
0       100  20202222         IT         Wangli
1       101  30205139         cs         WangMa
2       104  20213414        NaN            NaN
3       101  20338888         cs         WangMa

how = ‘right’

course.merge(choose,how = 'right')
Out[11]: 
  class_name  class_id class_lecturer    stu_id
0         IT       100         Wangli  20202222
1         cs       101         WangMa  30205139
2         cs       101         WangMa  20338888
3        NaN       104            NaN  20213414

choose.merge(course, how = 'right')
Out[13]: 
   class_id      stu_id class_name class_lecturer
0       100  20202222.0         IT         Wangli
1       101  30205139.0         cs         WangMa
2       101  20338888.0         cs         WangMa
3       102         NaN        CAD         Liping

两个相同的列merge:

course
Out[20]: 
  class_name  class_id class_lecturer school
0         IT       100         Wangli      A
1         cs       101         WangMa      B
2        CAD       102         Liping      C
choose
Out[21]: 
   class_id    stu_id school
0       100  20202222      A
1       101  30205139      A
2       104  20213414      C
3       101  20338888      B

比对两列相同列名后进行merge:

course.merge(choose)
Out[22]: 
  class_name  class_id class_lecturer school    stu_id
0         IT       100         Wangli      A  20202222
1         cs       101         WangMa      B  20338888

choose.merge(course)
Out[23]: 
   class_id    stu_id school class_name class_lecturer
0       100  20202222      A         IT         Wangli
1       101  20338888      B         cs         WangMa

两列都一致的会变成1行,不一致的还是分行,用NaN补空:

course.merge(choose, how = 'outer')
Out[24]: 
  class_name  class_id class_lecturer school      stu_id
0         IT       100         Wangli      A  20202222.0
1         cs       101         WangMa      B  20338888.0
2        CAD       102         Liping      C         NaN
3        NaN       101            NaN      A  30205139.0
4        NaN       104            NaN      C  20213414.0

https://mp.weixin.qq.com/s?__biz=MzU5Mjg2OTQ1MA==&mid=2247484160&idx=1&sn=c1ed435f441c2b53751fec3558e7edee&chksm=fe186225c96feb330e129a47ff979301f6dcdc042ce24fa7b23f61e21d6c13a30e25d00f469d&scene=21#wechat_redirect

我们来详解一下merge的参数,

left和rgiht分别对应着需要连接的左表和右表,

left_index与right_index是当我们用索引(这两个表的名字在索引中)连接时指定的参数,设置为on表示用该表的索引作为连接的条件(或者说桥梁)。

how是指定连接方式,这里用的inner,表示我们基于姓名索引来匹配,只返回两个表中共同(同时出现)姓名的数据。下面详解一下inner还涉及到的其他参数——left,right,outer。

merge(right, how=‘inner’, on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=(’_x’, ‘_y’), copy=True, indicator=False, validate=None)

pd.merge(left = df1, right = df2, left_index = True, right_index = True, how = 'inner')
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值