根据行索引合并数据join()
join() 方法能够通过列或指定列来连接DataFrame,其语法格式如下:
join(other, on=None, how='left', lsuffix=' ', rsuffix=' ', sort=False)
on:用于连接列名
how:可以从 left, right, outer, inner中任选一个,默认使用 left 方式
lsuffix:接收字符串,用于在左侧重叠的列名之后添加后缀名
rsuffix:接收字符串,用于在右侧重叠的列名之后添加后缀名
sort:接收布尔值,根据链接键对合并的数据进行排序,默认为False
下面是外链接合并示意图
left = pd.DataFrame({'A':['A0','A1'],
'B':['B0','B1']},
index=['a','b'])
right = pd.DataFrame({'C':['C0','C1'],
'D':['D0','D1']},
index=['c','d'])
print(left)
print("------------------------")
print(right)
print("------------------------")
left.join(right,how='outer')
--------------------------------------------下面是输出结果----------------------------------------------
A B
A A0 B0
b A1 B1
------------------------
C D
c C0 D0
d C1 D1
------------------------
A B C D
a A0 B0 NaN NaN
b A1 B1 NaN NaN
c NaN NaN C0 D0
d NaN NaN C1 D1
假设两个表中行索引和列索引重叠,那么当使用join()方法进行合并时,使用参数on指定重叠的列名即可
下面是合并示意图
left = pd.DataFrame({'A':['A0','A1','A2'],
'B':['B0','B1','B2'],
'key':['K0','K1','K2']})
right = pd.DataFrame({'C':['C0','C1','C2'],
'D':['D0','D1','D2']},
index = ['K0','K1','K2'])
print(left)
print('--------------------------------')
print(right)
print('--------------------------------')
left.join(right,how='left',on='key') # 这里outer,inner,left结果是一样的。
-------------------------------------下面是输出结果-----------------------------------------
A B key
0 A0 B0 K0
1 A1 B1 K1
2 A2 B2 K2
--------------------------------
C D
K0 C0 D0
K1 C1 D1
K2 C2 D2
--------------------------------
A B key C D
0 A0 B0 K0 C0 D0
1 A1 B1 K1 C1 D1
2 A2 B2 K2 C2 D2
合并重叠数据combine_first(other)
上述方法只有一个参数other,该参数用于接收填充缺失值的DataFrame对象。(根据索引位置的对应关系来进行填充)
下面是合并重叠数据图(样例图)
left = pd.DataFrame({'A':[np.nan,'A0','A1','A2'],
'B':[np.nan,'B1',np.nan,'B3'],
'key':['K0','K1','K2','K3']})
right = pd.DataFrame({'A':['E0','E1','E2'],
'C':['C0','C1','C2'],
'B':['D0','D1','D2']},
index = [1,0,2])
print(left)
print('--------------------------------')
print(right)
print('--------------------------------')
# 用右边的来填充左边的
left.combine_first(right)
-------------------------------------------下面是输出结果-----------------------------------------
A B key
0 NaN NaN K0
1 A0 B1 K1
2 A1 NaN K2
3 A2 B3 K3
--------------------------------
A C B
1 E0 C0 D0
0 E1 C1 D1
2 E2 C2 D2
--------------------------------
A B C key
0 E1 D1 C1 K0
1 A0 B1 C0 K1
2 A1 D2 C2 K2
3 A2 B3 NaN K3