merge的使用
例:
left = pd.DataFrame({'key':['k0','k1','k2','k3'],
'A':['A0','A1','A2','A3'],
'B':['B0','B1','B2','B3']})
right = pd.DataFrame({'key':['k0','k1','k2','k3'],
'C':['C0','C1','C2','C3'],
'D':['D0','D1','D2','D3']})
print(left)
print(right)
res = pd.merge(left,right,on='key')
print(res)
输出:
key A B
0 k0 A0 B0
1 k1 A1 B1
2 k2 A2 B2
3 k3 A3 B3
key C D
0 k0 C0 D0
1 k1 C1 D1
2 k2 C2 D2
3 k3 C3 D3
key A B C D
0 k0 A0 B0 C0 D0
1 k1 A1 B1 C1 D1
2 k2 A2 B2 C2 D2
3 k3 A3 B3 C3 D3
两个key的合并
例:
left = pd.DataFrame({'key1':['k0','k0','k1','k2'],
'key2':['k0','k1','k0','k1'],
'A':['A0','A1','A2','A3'],
'B':['B0','B1','B2','B3']})
right = pd.DataFrame({'key1':['k0','k1','k1','k2'],
'key2':['k0','k0','k0','k0'],
'C':['C0','C1','C2','C3'],
'D':['D0','D1','D2','D3']})
print(left)
print(right)
res = pd.merge(left,right,on=['key1','key2'])
print(res)
输出:
key1 key2 A B
0 k0 k0 A0 B0
1 k0 k1 A1 B1
2 k1 k0 A2 B2
3 k2 k1 A3 B3
key1 key2 C D
0 k0 k0 C0 D0
1 k1 k0 C1 D1
2 k1 k0 C2 D2
3 k2 k0 C3 D3
key1 key2 A B C D
0 k0 k0 A0 B0 C0 D0
1 k1 k0 A2 B2 C1 D1
2 k1 k0 A2 B2 C2 D2
默认为内连接
可使用左、右、内、外连接
indicator
例:
df1 = pd.DataFrame({'col1':[0,1],'col_left':['a','b']})
df2 = pd.DataFrame({'col1':[1,2,2],'col_right':[2,2,2]})
print(df1)
print(df2)
res = pd.merge(df1,df2,on='col1',how='outer',indicator=True)
print(res)
输出:
col1 col_left
0 0 a
1 1 b
col1 col_right
0 1 2
1 2 2
2 2 2
col1 col_left col_right _merge
0 0 a NaN left_only
1 1 b 2.0 both
2 2 NaN 2.0 right_only
3 2 NaN 2.0 right_only
其他类似的属性还有index
suffixes:如果和表合并的过程中遇到有一列两个表都同名,但是值不同,合并的时候又都想保留下来,就可以用suffixes给每个表的重复列名增加后缀。