熊猫merge（）–合并两个DataFrame对象

最新推荐文章于 2023-03-15 21:00:00 发布

cunchi4221

最新推荐文章于 2023-03-15 21:00:00 发布

阅读量989

点赞数

文章标签：数据库 python java php sql

原文链接：https://www.journaldev.com/33516/pandas-merge-two-dataframe

版权

Pandas的DataFrame merge()函数用于执行数据库风格的合并，可以进行内部、左、右、外部联接。此操作基于列或索引，返回新的DataFrame，原始数据不变。如何参数定义了合并类型，如SQL中的join操作。

摘要由CSDN通过智能技术生成

Pandas DataFrame merge() function is used to merge two DataFrame objects with a database-style join operation. The joining is performed on columns or indexes.

Pandas DataFrame merge（）函数用于通过数据库样式的合并操作合并两个DataFrame对象。连接是对列或索引执行的。

If the joining is done on columns, indexes are ignored. This function returns a new DataFrame and the source DataFrame objects are unchanged.

如果对列进行了连接，则索引将被忽略。此函数返回一个新的DataFrame，并且源DataFrame对象不变。

Pandas DataFrame merge（）函数语法 (Pandas DataFrame merge() Function Syntax)

The merge() function syntax is:

merge（）函数的语法为：

def merge(
    self,
    right,
    how="inner",
    on=None,
    left_on=None,
    right_on=None,
    left_index=False,
    right_index=False,
    sort=False,
    suffixes=("_x", "_y"),
    copy=True,
    indicator=False,
    validate=None,
)

right: The other DataFrame to merge with the source DataFrame.
right ：另一个与源DataFrame合并的DataFrame。
how: {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘inner’. This is the most important parameter to define the merge operation type. These are similar to SQL left outer join, right outer join, full outer join, and inner join.
方式：{'左'，'右'，'外'，'内'}，默认为'内'。这是定义合并操作类型的最重要参数。这些类似于SQL的左外部联接，右外部联接，完全外部联接和内部联接。
on: Column or index level names to join on. These columns must be present in both the DataFrames. If not provided, the intersection of the columns in both DataFrames are used.
on ：要加入的列或索引级别名称。这些列必须同时存在于两个DataFrame中。如果未提供，则使用两个DataFrame中列的交集。
left_on: Column or index level names to join on in the left DataFrame.
left_on ：要在左侧DataFrame中加入的列或索引级别名称。
right_on: Column or index level names to join on in the right DataFrame.
right_on ：要在右侧DataFrame中加入的列或索引级别名称。
left_index: Use the index from the left DataFrame as the join key(s).
left_index ：将左侧DataFrame的索引用作连接键。
right_index: Use the index from the right DataFrame as the join key.
right_index ：将右侧DataFrame中的索引用作连接键。
sort: Sort the join keys lexicographically in the result DataFrame.
sort ：按字典顺序对结果DataFrame中的联接键进行排序。
suffixes: Suffix to apply to overlapping column names in the left and right side, respectively.
后缀：后缀分别应用于左侧和右侧的重叠列名称。
indicator: If True, adds a column to output DataFrame called “_merge” with information on the source of each row.
indicator ：如果为True，则在输出数据帧中添加一列，称为“ _merge”，其中包含有关每一行源的信息。
validate: used to validate the merge process. The valid values are {“one_to_one” or “1:1”, “one_to_many” or “1:m”, “many_to_one” or “m:1”, “many_to_many” or “m:m”}.
validate ：用于验证合并过程。有效值为{“ one_to_one”或“ 1：1”，“ one_to_many”或“ 1：m”，“ many_to_one”或“ m：1”，“ many_to_many”或“ m：m”}。

熊猫DataFrame merge（）示例 (Pandas DataFrame merge() Examples)

Let’s look at some examples of merging two DataFrame objects.

让我们看一下合并两个DataFrame对象的一些示例。

1.默认合并–内部联接 (1. Default Merging – inner join)

import pandas as pd

d1 = {'Name': ['Pankaj', 'Meghna', 'Lisa'], 'Country': ['India', 'India', 'USA'], 'Role': ['CEO', 'CTO', 'CTO']}

df1 = pd.DataFrame(d1)

print('DataFrame 1:\n', df1)

df2 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Pankaj', 'Anupam', 'Amit']})
print('DataFrame 2:\n', df2)

df_merged = df1.merge(df2)
print('Result:\n', df_merged)

Output:

输出：

DataFrame 1:
      Name Country Role
0  Pankaj   India  CEO
1  Meghna   India  CTO
2    Lisa     USA  CTO
DataFrame 2:
    ID    Name
0   1  Pankaj
1   2  Anupam
2   3    Amit
Result:
      Name Country Role  ID
0  Pankaj   India  CEO   1

2.使用左，右和外部联接合并数据框 (2. Merging DataFrames with Left, Right, and Outer Join)

print('Result Left Join:\n', df1.merge(df2, how='left'))
print('Result Right Join:\n', df1.merge(df2, how='right'))
print('Result Outer Join:\n', df1.merge(df2, how='outer'))

Output:

输出：

Result Left Join:
      Name Country Role   ID
0  Pankaj   India  CEO  1.0
1  Meghna   India  CTO  NaN
2    Lisa     USA  CTO  NaN
Result Right Join:
      Name Country Role  ID
0  Pankaj   India  CEO   1
1  Anupam     NaN  NaN   2
2    Amit     NaN  NaN   3
Result Outer Join:
      Name Country Role   ID
0  Pankaj   India  CEO  1.0
1  Meghna   India  CTO  NaN
2    Lisa     USA  CTO  NaN
3  Anupam     NaN  NaN  2.0
4    Amit     NaN  NaN  3.0

3.在特定列上合并DataFrame (3. Merging DataFrame on Specific Columns)

import pandas as pd

d1 = {'Name': ['Pankaj', 'Meghna', 'Lisa'], 'ID': [1, 2, 3], 'Country': ['India', 'India', 'USA'],
      'Role': ['CEO', 'CTO', 'CTO']}
df1 = pd.DataFrame(d1)

df2 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Pankaj', 'Anupam', 'Amit']})

print(df1.merge(df2, on='ID'))
print(df1.merge(df2, on='Name'))

Output:

输出：

Name_x  ID Country Role  Name_y
0  Pankaj   1   India  CEO  Pankaj
1  Meghna   2   India  CTO  Anupam
2    Lisa   3     USA  CTO    Amit

     Name  ID_x Country Role  ID_y
0  Pankaj     1   India  CEO     1

4.指定用于合并DataFrame对象的左列和右列 (4. Specify Left and Right Columns for Merging DataFrame Objects)

import pandas as pd

d1 = {'Name': ['Pankaj', 'Meghna', 'Lisa'], 'ID1': [1, 2, 3], 'Country': ['India', 'India', 'USA'],
      'Role': ['CEO', 'CTO', 'CTO']}
df1 = pd.DataFrame(d1)

df2 = pd.DataFrame({'ID2': [1, 2, 3], 'Name': ['Pankaj', 'Anupam', 'Amit']})

print(df1.merge(df2))

print(df1.merge(df2, left_on='ID1', right_on='ID2'))

Output;

输出；

Name  ID1 Country Role  ID2
0  Pankaj    1   India  CEO    1

   Name_x  ID1 Country Role  ID2  Name_y
0  Pankaj    1   India  CEO    1  Pankaj
1  Meghna    2   India  CTO    2  Anupam
2    Lisa    3     USA  CTO    3    Amit

5.使用索引作为合并数据帧的联接键 (5. Using Index as the Join Keys for Merging DataFrames)

import pandas as pd

d1 = {'Name': ['Pankaj', 'Meghna', 'Lisa'], 'Country': ['India', 'India', 'USA'], 'Role': ['CEO', 'CTO', 'CTO']}
df1 = pd.DataFrame(d1)

df2 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Pankaj', 'Anupam', 'Amit']})

df_merged = df1.merge(df2)
print('Result Default Merge:\n', df_merged)

df_merged = df1.merge(df2, left_index=True, right_index=True)
print('\nResult Index Merge:\n', df_merged)

Output:

输出：

Result Default Merge:
      Name Country Role  ID
0  Pankaj   India  CEO   1

Result Index Merge:
    Name_x Country Role  ID  Name_y
0  Pankaj   India  CEO   1  Pankaj
1  Meghna   India  CTO   2  Anupam
2    Lisa     USA  CTO   3    Amit