如果两个DataFrame的列名不一致,但你希望根据列名不一致的列进行合并,可以通过明确指定left_on
和right_on
参数来实现这一点。left_on
参数接受一个列表,表示左侧DataFrame中用于合并的列名,而right_on
参数同样接受一个列表,表示右侧DataFrame中用于合并的列名。
import pandas as pd
df1 = pd.DataFrame({
'name': ['Alice', 'Bob', 'Charlie'],
'age': [20, 22, 21],
'hobby': ['swim', 'shopping', 'sleep']
})
df2 = pd.DataFrame({
'last_name': ['Alice', 'Bob', 'Charlie'], # 假设X列的值与A列相匹配
'how_old': [20, 34, 21],
'food': ['beef', 'cake', 'milk']
})
# 根据列名不一致的列进行合并
merged_df = pd.merge(df1, df2, left_on=['name', 'age'], right_on=['last_name', 'how_old'])
print(merged_df)
name age hobby last_name how_old food
0 Alice 20 swim Alice 20 beef
1 Charlie 21 sleep Charlie 21 milk
ps:如果使用外连接(outer)merge的话
merged_df = pd.merge(df1, df2, how='outer',left_on=['name', 'age'], right_on=['last_name', 'how_old'])
输出:
name age hobby last_name how_old food
0 Alice 20.0 swim Alice 20.0 beef
1 Bob 22.0 shopping NaN NaN NaN
2 NaN NaN NaN Bob 34.0 cake
3 Charlie 21.0 sleep Charlie 21.0 milk
如果用--而不是NaN
来填充缺失的列:
merged_df.fillna('--', inplace=True)
# 这里会有 FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value '--' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
输出:
name age hobby last_name how_old food
0 Alice 20.0 swim Alice 20.0 beef
1 Bob 22.0 shopping -- -- --
2 -- -- -- Bob 34.0 cake
3 Charlie 21.0 sleep Charlie 21.0 milk