pandas之DataFrame的连接函数join()介绍

最新推荐文章于 2025-04-10 15:29:35 发布

会飞的哼哧

最新推荐文章于 2025-04-10 15:29:35 发布

阅读量4.7w

点赞数 16

分类专栏： Python之padans库文章标签： join() DataFrame之间的连接

本文链接：https://blog.csdn.net/qq_38233659/article/details/94666134

版权

Python之padans库专栏收录该内容

3 篇文章

订阅专栏

pandas.DataFrame.join

原文参考于https://www.jianshu.com/p/2358d4013067
通过索引或者指定的列连接两个DataFrame。
DataFrame.join(other, on=None, how=’left’, lsuffix=”, rsuffix=”, sort=False)

参数说明

other:【DataFrame，或者带有名字的Series，或者DataFrame的list】如果传递的是Series，那么其name属性应当是一个集合，并且该集合将会作为结果DataFrame的列名
on:【列名称，或者列名称的list/tuple，或者类似形状的数组】连接的列，默认使用索引连接
how:【{‘left’, ‘right’, ‘outer’, ‘inner’}, default:‘left’】连接的方式，默认为左连接
lsuffix:【string】左DataFrame中重复列的后缀
rsuffix:【string】右DataFrame中重复列的后缀
sort:【boolean, default
False】按照字典顺序对结果在连接键上排序。如果为False，连接键的顺序取决于连接类型（关键字）。

实例

现有first与other两个DataFrame对象

import pandas as pd
first=pd.DataFrame({'item_id':['a','b','c','b','d'],'item_price':[1,2,3,2,4]})
other=pd.DataFrame({'item_id':['a','b','f'],'item_atr':['k1','k2','k3']})
print(first)
print(other)

结果如下

  item_id  item_price
0       a           1
1       b           2
2       c           3
3       b           2
4       d           4
  item_id item_atr
0       a       k1
1       b       k2
2       f       k3

通过索引连接DataFrame

print(first.join(other, lsuffix='_left', rsuffix='_right'))

结果如下

  item_id_left  item_price item_id_right item_atr
0            a           1             a       k1
1            b           2             b       k2
2            c           3             f       k3
3            b           2           NaN      NaN
4            d           4           NaN      NaN

通过指定的列连接DataFrame

print(first.set_index('item_id').join(other.set_index('item_id')))

结果如下：

         item_price item_atr
item_id                     
a                 1       k1
b                 2       k2
b                 2       k2
c                 3      NaN
d                 4      NaN

通过on参数指定连接的列，DataFrame.join总是使用other的索引去连接first，因此我们可以把指定的列设置为other的索引，然后用on去指定first的连接列，这样可以让连接结果的索引和first一致

print(first.join(other.set_index('item_id'),on='item_id'))

结果如下：

  item_id  item_price item_atr
0       a           1       k1
1       b           2       k2
2       c           3      NaN
3       b           2       k2
4       d           4      NaN

右连接

print(first.join(other,how='right',lsuffix='_left',rsuffix='_right'))

结果如下：

  item_id_left  item_price item_id_right item_atr
0            a           1             a       k1
1            b           2             b       k2
2            c           3             f       k3

左连接
同索引连接效果相同

print(first.join(other,how='left',lsuffix='_left',rsuffix='_right'))

结果如下：

  item_id_left  item_price item_id_right item_atr
0            a           1             a       k1
1            b           2             b       k2
2            c           3             f       k3
3            b           2           NaN      NaN
4            d           4           NaN      NaN

外连接

print(first.join(other.set_index('item_id'),on='item_id',how='outer'))

结果如下：

  item_id  item_price item_atr
0       a         1.0       k1
1       b         2.0       k2
3       b         2.0       k2
2       c         3.0      NaN
4       d         4.0      NaN
4       f         NaN       k3

内连接

print(first.join(other.set_index('item_id'),on='item_id',how='inner'))

结果如下：

  item_id  item_price item_atr
0       a           1       k1
1       b           2       k2
3       b           2       k2