Pandas DataFrame merge() function is used to merge two DataFrame objects with a database-style join operation. The joining is performed on columns or indexes.
Pandas DataFrame merge()函数用于通过数据库样式的合并操作合并两个DataFrame对象。 连接是对列或索引执行的。
If the joining is done on columns, indexes are ignored. This function returns a new DataFrame and the source DataFrame objects are unchanged.
如果对列进行了连接,则索引将被忽略。 此函数返回一个新的DataFrame,并且源DataFrame对象不变。
Pandas DataFrame merge()函数语法 (Pandas DataFrame merge() Function Syntax)
The merge() function syntax is:
merge()函数的语法为:
def merge(
self,
right,
how="inner",
on=None,
left_on=None,
right_on=None,
left_index=False,
right_index=False,
sort=False,
suffixes=("_x", "_y"),
copy=True,
indicator=False,
validate=None,
)
- right: The other DataFrame to merge with the source DataFrame. right :另一个与源DataFrame合并的DataFrame。
- how: {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘inner’. This is the most important parameter to define the merge operation type. These are similar to SQL left outer join, right outer join, full outer join, and inner join. 方式 :{'左','右','外','内'},默认为'内'。 这是定义合并操作类型的最重要参数。 这些类似于SQL的左外部联接,右外部联接,完全外部联接和内部联接。
- on: Column or index level names to join on. These columns must be present in both the DataFrames. If not provided, the intersection of the columns in both DataFrames are used. on :要加入的列或索引级别名称。 这些列必须同时存在于两个DataFrame中。 如果未提供,则使用两个DataFrame中列的交集。
- left_on: Column or index level names to join on in the left DataFrame. left_on :要在左侧DataFrame中加入的列或索引级别名称。
- right_on: Column or index level names to join on in the right DataFrame. right_on :要在右侧DataFrame中加入的列或索引级别名称。
- left_index: Use the index from the left DataFrame as the join key(s). left_index :将左侧DataFrame的索引用作连接键。
- right_index: Use the index from the right DataFrame as the join key. right_index :将右侧DataFrame中的索引用作连接键。
- sort: Sort the join keys lexicographically in the result DataFrame. sort :按字典顺序对结果DataFrame中的联接键进行排序。
- suffixes: Suffix to apply to overlapping column names in the left and right side, respectively. 后缀 :后缀分别应用于左侧和右侧的重叠列名称。
- indicator: If True, adds a column to output DataFrame called “_merge” with information on the source of each row. indicator :如果为True,则在输出数据帧中添加一列,称为“ _merge”,其中包含有关每一行源的信息。
- validate: used to validate the merge process. The valid values are {“one_to_one” or “1:1”, “one_to_many” or “1:m”, “many_to_one” or “m:1”, “many_to_many” or “m:m”}. validate :用于验证合并过程。 有效值为{“ one_to_one”或“ 1:1”,“ one_to_many”或“ 1:m”,“ many_to_one”或“ m:1”,“ many_to_many”或“ m:m”}。
熊猫DataFrame merge()示例 (Pandas DataFrame merge() Examples)
Let’s look at some examples of merging two DataFrame objects.
让我们看一下合并两个DataFrame对象的一些示例。
1.默认合并–内部联接 (1. Default Merging – inner join)
import pandas as pd
d1 = {'Name': ['Pankaj', 'Meghna', 'Lisa'], 'Country': ['India', 'India', 'USA'], 'Role': ['CEO', 'CTO', 'CTO']}
df1 = pd.DataFrame(d1)
print('DataFrame 1:\n', df1)
df2 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Pankaj', 'Anupam', 'Amit']})
print('DataFrame 2:\n', df2)
df_merged = df1.merge(df2)
print('Result:\n', df_merged)
Output:
输出:
DataFrame 1:
Name Country Role
0 Pankaj India CEO
1 Meghna India CTO
2 Lisa USA CTO
DataFrame 2:
ID Name
0 1 Pankaj
1 2 Anupam
2 3 Amit
Result:
Name Country Role ID
0 Pankaj India CEO 1
2.使用左,右和外部联接合并数据框 (2. Merging DataFrames with Left, Right, and Outer Join)
print('Result Left Join:\n', df1.merge(df2, how='left'))
print('Result Right Join:\n', df1.merge(df2, how='right'))
print('Result Outer Join:\n', df1.merge(df2, how='outer'))
Output:
输出:
Result Left Join:
Name Country Role ID
0 Pankaj India CEO 1.0
1 Meghna India CTO NaN
2 Lisa USA CTO NaN
Result Right Join:
Name Country Role ID
0 Pankaj India CEO 1
1 Anupam NaN NaN 2
2 Amit NaN NaN 3
Result Outer Join:
Name Country Role ID
0 Pankaj India CEO 1.0
1 Meghna India CTO NaN
2 Lisa USA CTO NaN
3 Anupam NaN NaN 2.0
4 Amit NaN NaN 3.0
3.在特定列上合并DataFrame (3. Merging DataFrame on Specific Columns)
import pandas as pd
d1 = {'Name': ['Pankaj', 'Meghna', 'Lisa'], 'ID': [1, 2, 3], 'Country': ['India', 'India', 'USA'],
'Role': ['CEO', 'CTO', 'CTO']}
df1 = pd.DataFrame(d1)
df2 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Pankaj', 'Anupam', 'Amit']})
print(df1.merge(df2, on='ID'))
print(df1.merge(df2, on='Name'))
Output:
输出:
Name_x ID Country Role Name_y
0 Pankaj 1 India CEO Pankaj
1 Meghna 2 India CTO Anupam
2 Lisa 3 USA CTO Amit
Name ID_x Country Role ID_y
0 Pankaj 1 India CEO 1
4.指定用于合并DataFrame对象的左列和右列 (4. Specify Left and Right Columns for Merging DataFrame Objects)
import pandas as pd
d1 = {'Name': ['Pankaj', 'Meghna', 'Lisa'], 'ID1': [1, 2, 3], 'Country': ['India', 'India', 'USA'],
'Role': ['CEO', 'CTO', 'CTO']}
df1 = pd.DataFrame(d1)
df2 = pd.DataFrame({'ID2': [1, 2, 3], 'Name': ['Pankaj', 'Anupam', 'Amit']})
print(df1.merge(df2))
print(df1.merge(df2, left_on='ID1', right_on='ID2'))
Output;
输出;
Name ID1 Country Role ID2
0 Pankaj 1 India CEO 1
Name_x ID1 Country Role ID2 Name_y
0 Pankaj 1 India CEO 1 Pankaj
1 Meghna 2 India CTO 2 Anupam
2 Lisa 3 USA CTO 3 Amit
5.使用索引作为合并数据帧的联接键 (5. Using Index as the Join Keys for Merging DataFrames)
import pandas as pd
d1 = {'Name': ['Pankaj', 'Meghna', 'Lisa'], 'Country': ['India', 'India', 'USA'], 'Role': ['CEO', 'CTO', 'CTO']}
df1 = pd.DataFrame(d1)
df2 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Pankaj', 'Anupam', 'Amit']})
df_merged = df1.merge(df2)
print('Result Default Merge:\n', df_merged)
df_merged = df1.merge(df2, left_index=True, right_index=True)
print('\nResult Index Merge:\n', df_merged)
Output:
输出:
Result Default Merge:
Name Country Role ID
0 Pankaj India CEO 1
Result Index Merge:
Name_x Country Role ID Name_y
0 Pankaj India CEO 1 Pankaj
1 Meghna India CTO 2 Anupam
2 Lisa USA CTO 3 Amit
参考资料 (References)
翻译自: https://www.journaldev.com/33516/pandas-merge-two-dataframe