http://liao.cpython.org/pandas27/
https://mp.weixin.qq.com/s?__biz=MzU5Mjg2OTQ1MA==&mid=2247484160&idx=1&sn=c1ed435f441c2b53751fec3558e7edee&chksm=fe186225c96feb330e129a47ff979301f6dcdc042ce24fa7b23f61e21d6c13a30e25d00f469d&scene=21#wechat_redirect
merge(right, how=‘inner’, on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=(’_x’, ‘_y’), copy=True, indicator=False, validate=None)
merge函数默认拼接数据是inner join即内连接。
先构造两个df, 含有相同的1列列名:
col1 = 'class_name class_id class_lecturer'.split()
col2 = 'class_id stu_id'.split()
val1 = [['IT',100,'Wangli'],['cs',101,'WangMa'],['CAD',102,'Liping']]
val2 = [[100,20202222],[101,30205139],[104,20213414],[101,20338888]]
course = pd.DataFrame(val1,columns = col1)
choose = pd.DataFrame(val2,columns = col2)
course
Out[9]:
class_name class_id class_lecturer
0 IT 100 Wangli
1 cs 101 WangMa
2 CAD 102 Liping
choose
Out[15]:
class_id stu_id
0 100 20202222
1 101 30205139
2 104 20213414
3 101 20338888
默认的左右merge 结果,会选择一个列相同的column进行merge:
course.merge(choose)
Out[16]:
class_name class_id class_lecturer stu_id
0 IT 100 Wangli 20202222
1 cs 101 WangMa 30205139
2 cs 101 WangMa 20338888
choose.merge(course)
Out[17]:
class_id stu_id class_name class_lecturer
0 100 20202222 IT Wangli
1 101 30205139 cs WangMa
2 101 20338888 cs WangMa
merge 加入how = 'outer’后,结果是两个DataFrame均输出,未匹配上的用NaN填充。行数会相应增加的。
course.merge(choose,how = 'outer')
Out[6]:
class_name class_id class_lecturer stu_id
0 IT 100 Wangli 20202222.0
1 cs 101 WangMa 30205139.0
2 cs 101 WangMa 20338888.0
3 CAD 102 Liping NaN
4 NaN 104 NaN 20213414.0
choose.merge(course,how = 'outer')
Out[7]:
class_id stu_id class_name class_lecturer
0 100 20202222.0 IT Wangli
1 101 30205139.0 cs WangMa
2 101 20338888.0 cs WangMa
3 104 20213414.0 NaN NaN
4 102 NaN CAD Liping
how = ‘left’
course.merge(choose,how = 'left')
Out[10]:
class_name class_id class_lecturer stu_id
0 IT 100 Wangli 20202222.0
1 cs 101 WangMa 30205139.0
2 cs 101 WangMa 20338888.0
3 CAD 102 Liping NaN
choose.merge(course, how = 'left')
Out[12]:
class_id stu_id class_name class_lecturer
0 100 20202222 IT Wangli
1 101 30205139 cs WangMa
2 104 20213414 NaN NaN
3 101 20338888 cs WangMa
how = ‘right’
course.merge(choose,how = 'right')
Out[11]:
class_name class_id class_lecturer stu_id
0 IT 100 Wangli 20202222
1 cs 101 WangMa 30205139
2 cs 101 WangMa 20338888
3 NaN 104 NaN 20213414
choose.merge(course, how = 'right')
Out[13]:
class_id stu_id class_name class_lecturer
0 100 20202222.0 IT Wangli
1 101 30205139.0 cs WangMa
2 101 20338888.0 cs WangMa
3 102 NaN CAD Liping
两个相同的列merge:
course
Out[20]:
class_name class_id class_lecturer school
0 IT 100 Wangli A
1 cs 101 WangMa B
2 CAD 102 Liping C
choose
Out[21]:
class_id stu_id school
0 100 20202222 A
1 101 30205139 A
2 104 20213414 C
3 101 20338888 B
比对两列相同列名后进行merge:
course.merge(choose)
Out[22]:
class_name class_id class_lecturer school stu_id
0 IT 100 Wangli A 20202222
1 cs 101 WangMa B 20338888
choose.merge(course)
Out[23]:
class_id stu_id school class_name class_lecturer
0 100 20202222 A IT Wangli
1 101 20338888 B cs WangMa
两列都一致的会变成1行,不一致的还是分行,用NaN补空:
course.merge(choose, how = 'outer')
Out[24]:
class_name class_id class_lecturer school stu_id
0 IT 100 Wangli A 20202222.0
1 cs 101 WangMa B 20338888.0
2 CAD 102 Liping C NaN
3 NaN 101 NaN A 30205139.0
4 NaN 104 NaN C 20213414.0
https://mp.weixin.qq.com/s?__biz=MzU5Mjg2OTQ1MA==&mid=2247484160&idx=1&sn=c1ed435f441c2b53751fec3558e7edee&chksm=fe186225c96feb330e129a47ff979301f6dcdc042ce24fa7b23f61e21d6c13a30e25d00f469d&scene=21#wechat_redirect
我们来详解一下merge的参数,
left和rgiht分别对应着需要连接的左表和右表,
left_index与right_index是当我们用索引(这两个表的名字在索引中)连接时指定的参数,设置为on表示用该表的索引作为连接的条件(或者说桥梁)。
how是指定连接方式,这里用的inner,表示我们基于姓名索引来匹配,只返回两个表中共同(同时出现)姓名的数据。下面详解一下inner还涉及到的其他参数——left,right,outer。
merge(right, how=‘inner’, on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=(’_x’, ‘_y’), copy=True, indicator=False, validate=None)
pd.merge(left = df1, right = df2, left_index = True, right_index = True, how = 'inner')