pandas数据合并--join

最新推荐文章于 2024-06-18 11:17:43 发布

GISer_流浪

最新推荐文章于 2024-06-18 11:17:43 发布

阅读量604

点赞数 1

文章标签： pandas

本文链接：https://blog.csdn.net/weixin_44346222/article/details/87924989

版权

简述

DataFrame还有一个便捷的join实例方法，它能更为方便地实现按索引合并。它还可用于合并多个带有相同或相似索引的DataFrame对象，但要求没有重叠的列 如果有重叠的列需要指定后缀。

API

data.join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False)
Docstring:
Join columns with other DataFrame either on index or on a key
column. Efficiently Join multiple DataFrame objects by index at once by
passing a list.

Parameters
----------
other : DataFrame, Series with name field set, or list of DataFrame
    Index should be similar to one of the columns in this one. If a
    Series is passed, its name attribute must be set, and that will be
    used as the column name in the resulting joined DataFrame
on : name, tuple/list of names, or array-like
    Column or index level name(s) in the caller to join on the index
    in `other`, otherwise joins index-on-index. If multiple
    values given, the `other` DataFrame must have a MultiIndex. Can
    pass an array as the join key if it is not already contained in
    the calling DataFrame. Like an Excel VLOOKUP operation
how : {'left', 'right', 'outer', 'inner'}, default: 'left'
    How to handle the operation of the two objects.

    * left: use calling frame's index (or column if on is specified)
    * right: use other frame's index
    * outer: form union of calling frame's index (or column if on is
      specified) with other frame's index, and sort it
      lexicographically
    * inner: form intersection of calling frame's index (or column if
      on is specified) with other frame's index, preserving the order
      of the calling's one
lsuffix : string
    Suffix to use from left frame's overlapping columns
rsuffix : string
    Suffix to use from right frame's overlapping columns
sort : boolean, default False
    Order result DataFrame lexicographically by the join key. If False,
    the order of the join key depends on the join type (how keyword)

Notes
-----
on, lsuffix, and rsuffix options are not supported when passing a list
of DataFrame objects

Support for specifying index levels as the `on` parameter was added
in version 0.23.0

example

In [68]: left2 = pd.DataFrame([[1., 2.], [3., 4.], [5., 6.]],
   ....:                      index=['a', 'c', 'e'],
   ....:                      columns=['Ohio', 'Nevada'])

In [69]: right2 = pd.DataFrame([[7., 8.], [9., 10.], [11., 12.], [13, 14]],
   ....:                       index=['b', 'c', 'd', 'e'],
   ....:                       columns=['Missouri', 'Alabama'])

In [73]: left2.join(right2, how='outer')
Out[73]: 
   Ohio  Nevada  Missouri  Alabama
a   1.0     2.0       NaN      NaN
b   NaN     NaN       7.0      8.0
c   3.0     4.0       9.0     10.0
d   NaN     NaN      11.0     12.0
e   5.0     6.0      13.0     14.0

因为一些历史版本的遗留原因，DataFrame的join方法默认使用的是左连接，保留左边表的行索引。它还支持在调用的DataFrame的列上，连接传递的DataFrame索引：

In [74]: left1.join(right1, on='key')
Out[74]: 
  key  value  group_val
0   a      0        3.5
1   b      1        7.0
2   a      2        3.5
3   a      3        3.5
4   b      4        7.0
5   c      5        NaN

最后，对于简单的索引合并，你还可以向join传入一组DataFrame，下一节会介绍更为通用的concat函数，也能实现此功能：

In [75]: another = pd.DataFrame([[7., 8.], [9., 10.], [11., 12.], [16., 17.]],
   ....:                        index=['a', 'c', 'e', 'f'],
   ....:                        columns=['New York',
'Oregon'])

In [76]: another
Out[76]: 
   New York  Oregon
a       7.0     8.0
c       9.0    10.0
e      11.0    12.0
f      16.0    17.0

In [77]: left2.join([right2, another])
Out[77]: 
   Ohio  Nevada  Missouri  Alabama  New York  Oregon
a   1.0     2.0       NaN      NaN       7.0     8.0
c   3.0     4.0       9.0     10.0       9.0    10.0
e   5.0     6.0      13.0     14.0      11.0    12.0

In [78]: left2.join([right2, another], how='outer')
Out[78]: 
   Ohio  Nevada  Missouri  Alabama  New York  Oregon
a   1.0     2.0       NaN      NaN       7.0     8.0
b   NaN     NaN       7.0      8.0       NaN     NaN
c   3.0     4.0       9.0     10.0       9.0    10.0
d   NaN     NaN      11.0     12.0       NaN     NaN
e   5.0     6.0      13.0     14.0      11.0    12.0
f   NaN     NaN       NaN      NaN      16.0    17.0

GISer_流浪

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
pandas数据合并--join

简述DataFrame还有一个便捷的join实例方法，它能更为方便地实现按索引合并。它还可用于合并多个带有相同或相似索引的DataFrame对象，但要求没有重叠的列。在上面那个例子中，我们可以编写：In [68]: left2 = pd.DataFrame([[1., 2.], [3., 4.], [5., 6.]], ....: index=[...
复制链接

扫一扫