合并数据
join()是实例对象方法
pd.merge()是顶级类方法
join()默认以index作为key,也可以通过参数on指定其他列为key。
参数 | 类型 | 说明 |
other | DataFrame List of DataFrame | join函数可以多个DataFrame合并 ldf.join([rdf,rdf1,rdf2]) |
on | 字符串 列表 | column's name ldf.join(rdf, on='key1') ldf的key1列作为key,与rdf的key相比较 |
how | inner outer left right | left 默认值 left 以 ldf 为主体,rdf不足数据用Nan填充,求两个DataFrame对象的并集 (ldf的数据肯定没有Nan) right以 rdf 为主体,ldf不足数据用Nan填充,求两个DataFrame对象的并集 (rdf的数据肯定没有Nan) inner 两个DataFrame对象的交集 outer 两个DataFrame对象的并集 |
sort | boolean | False 默认值 Order result DataFrame lexicographically by the join key. If False, the order of the join key depends on the join type (how keyword) |
lsuffix | 字符串 | Set new column's name |
rsuffix | 字符串 | Set new column's name |
ldf
A key
0 A0 K0
1 A1 K1
2 A2 K2
3 A3 K3
4 A4 K4
5 A5 K5
rdf
B key
0 B0 K0
1 B1 K1
2 B2 K2
<<Join base defalt Index>>
ldf.join(rdf, lsuffix='_ldf', rsuffix='_rdf')
>>> A key_ldf B key_rdf
0 A0 K0 B0 K0
1 A1 K1 B1 K1
2 A2 K2 B2 K2
3 A3 K3 NaN NaN
4 A4 K4 NaN NaN
5 A5 K5 NaN NaN
----------------------------------------------------------------------------
<<Set New Index,back New object>>
ldf.set_index('key')
A
key
K0 A0
K1 A1
K2 A2
K3 A3
K4 A4
K5 A5
<<Set New Index,back New object>>
rdf.set_index('key')
B
key
K0 B0
K1 B1
K2 B2
<<Join base new index>>
ldf.set_index('key').join(rdf.set_index('key'))
A B
key
K0 A0 B0
K1 A1 B1
K2 A2 B2
K3 A3 NaN
K4 A4 NaN
K5 A5 NaN
----------------------------------------------------------------------------
ldf
A key
0 A0 K0
1 A1 K1
2 A2 K2
3 A3 K3
4 A4 K4
5 A5 K5
<<Set New Index,back New object>>
rdf.set_index('key')
B
key
K0 B0
K1 B1
K2 B2
<<不用index,使用普通列作为join key>>
ldf.join(rdf.set_index('key'), on='key')
key A B
0 K0 A0 B0
1 K1 A1 B1
2 K2 A2 B2
3 K3 A3 NaN
4 K4 A4 NaN
5 K5 A5 NaN