left = pd.DataFrame({'sno':[11,12,13,14],'name':['name_a','name_b','name_c','name_d']})
left
sno
name
0
11
name_a
1
12
name_b
2
13
name_c
3
14
name_d
right = pd.DataFrame({'sno':[11,12,13,14],'age':['21','22','23','24']})
right
sno
age
0
11
21
1
12
22
2
13
23
3
14
24
# 一对一关系,结果中有4条
pd.merge(left, right, on='sno')
sno
name
age
0
11
name_a
21
1
12
name_b
22
2
13
name_c
23
3
14
name_d
24
2.2 one-to-many 一对多关系的merge
注意:数据会被复制
left = pd.DataFrame({'sno':[11,12,13,14],'name':['name_a','name_b','name_c','name_d']})
left
sno
name
0
11
name_a
1
12
name_b
2
13
name_c
3
14
name_d
right = pd.DataFrame({'sno':[11,11,11,12,12,13],'grade':['语文88','数学90','英语75','语文66','数学55','英语29']})
right
sno
grade
0
11
语文88
1
11
数学90
2
11
英语75
3
12
语文66
4
12
数学55
5
13
英语29
# 数目以多的一边为准
pd.merge(left, right, on='sno')
sno
name
grade
0
11
name_a
语文88
1
11
name_a
数学90
2
11
name_a
英语75
3
12
name_b
语文66
4
12
name_b
数学55
5
13
name_c
英语29
2.3 many-to-many 多对多关系的merge
注意:结果数量会出现乘法
left = pd.DataFrame({'sno':[11,11,12,12,12],'爱好':['篮球','羽毛球','乒乓球','篮球',"足球"]})
left
sno
爱好
0
11
篮球
1
11
羽毛球
2
12
乒乓球
3
12
篮球
4
12
足球
right = pd.DataFrame({'sno':[11,11,11,12,12,13],'grade':['语文88','数学90','英语75','语文66','数学55','英语29']})
right
sno
grade
0
11
语文88
1
11
数学90
2
11
英语75
3
12
语文66
4
12
数学55
5
13
英语29
pd.merge(left, right, on='sno')
sno
爱好
grade
0
11
篮球
语文88
1
11
篮球
数学90
2
11
篮球
英语75
3
11
羽毛球
语文88
4
11
羽毛球
数学90
5
11
羽毛球
英语75
6
12
乒乓球
语文66
7
12
乒乓球
数学55
8
12
篮球
语文66
9
12
篮球
数学55
10
12
足球
语文66
11
12
足球
数学55
3、理解left join、right join、inner join、outer join的区别
left = pd.DataFrame({'key':['K0','K1','K2','K3'],'A':['A0','A1','A2','A3'],'B':['B0','B1','B2','B3']})
right = pd.DataFrame({'key':['K0','K1','K4','K5'],'C':['C0','C1','C4','C5'],'D':['D0','D1','D4','D5']})
left
key
A
B
0
K0
A0
B0
1
K1
A1
B1
2
K2
A2
B2
3
K3
A3
B3
right
key
C
D
0
K0
C0
D0
1
K1
C1
D1
2
K4
C4
D4
3
K5
C5
D5
3.1 inner join,默认
左边和右边的key都有,才会出现在结果里
pd.merge(left, right, how='inner')
key
A
B
C
D
0
K0
A0
B0
C0
D0
1
K1
A1
B1
C1
D1
3.2 left join
左边的都会出现在结果里,右边的如果无法匹配则为Null
pd.merge(left, right, how='left')
key
A
B
C
D
0
K0
A0
B0
C0
D0
1
K1
A1
B1
C1
D1
2
K2
A2
B2
NaN
NaN
3
K3
A3
B3
NaN
NaN
3.3 right join
右边的都会出现在结果里,左边的如果无法匹配则为Null
pd.merge(left, right, how='right')
key
A
B
C
D
0
K0
A0
B0
C0
D0
1
K1
A1
B1
C1
D1
2
K4
NaN
NaN
C4
D4
3
K5
NaN
NaN
C5
D5
3.4 outer join
左边、右边的都会出现在结果里,如果无法匹配则为Null
pd.merge(left, right, how='outer')
key
A
B
C
D
0
K0
A0
B0
C0
D0
1
K1
A1
B1
C1
D1
2
K2
A2
B2
NaN
NaN
3
K3
A3
B3
NaN
NaN
4
K4
NaN
NaN
C4
D4
5
K5
NaN
NaN
C5
D5
4、如果出现非Key的字段重名怎么办
left = pd.DataFrame({'key':['K0','K1','K2','K3'],'A':['A0','A1','A2','A3'],'B':['B0','B1','B2','B3']})
right = pd.DataFrame({'key':['K0','K1','K4','K5'],'A':['A10','A11','A12','A13'],'D':['D0','D1','D4','D5']})