pandas入门——数据合并merge函数

数据合并merge函数

  • 创建数据集
# 导入pandas和numpy包
import pandas as pd
import numpy as np

# 创建两个数据框
df_left = pd.DataFrame(data=np.ones((5,6)),columns=["a","b","c","d","e","f"],index=["k1","k2","k3","k4","k5"])
df_right = pd.DataFrame(data=np.ones((5,6))*2,columns=["e","f","g","h","j","k"],index=["k3","k4","k5","k6","k7"])

df_left["key1"] = ["k1","k0","k0","k1","k1"]
df_left["key2"] = ["k0","k0","k1","k1","k0"]

df_right["key1"] = ["k1","k0","k0","k0","k1"]
df_right["key2"] = ["k0","k1","k1","k1","k0"]

print(df_right)
print(df_left)

    e   f   g   h   j   k   key1    key2
k3  2.0 2.0 2.0 2.0 2.0 2.0 k1  k0
k4  2.0 2.0 2.0 2.0 2.0 2.0 k0  k1
k5  2.0 2.0 2.0 2.0 2.0 2.0 k0  k1
k6  2.0 2.0 2.0 2.0 2.0 2.0 k0  k1
k7  2.0 2.0 2.0 2.0 2.0 2.0 k1  k0


    a   b   c   d   e   f   key1    key2
k1  1.0 1.0 1.0 1.0 1.0 1.0 k1  k0
k2  1.0 1.0 1.0 1.0 1.0 1.0 k0  k0
k3  1.0 1.0 1.0 1.0 1.0 1.0 k0  k1
k4  1.0 1.0 1.0 1.0 1.0 1.0 k1  k1
k5  1.0 1.0 1.0 1.0 1.0 1.0 k1  k0
  • merge默认的合并方式是inner
print(pd.merge(left=df_left,right=df_right,on=["key1","key2"],how="inner"))


a   b   c   d   e_x f_x key1    key2    e_y f_y g   h   j   k
0   1.0 1.0 1.0 1.0 1.0 1.0 k1  k0  2.0 2.0 2.0 2.0 2.0 2.0
1   1.0 1.0 1.0 1.0 1.0 1.0 k1  k0  2.0 2.0 2.0 2.0 2.0 2.0
2   1.0 1.0 1.0 1.0 1.0 1.0 k1  k0  2.0 2.0 2.0 2.0 2.0 2.0
3   1.0 1.0 1.0 1.0 1.0 1.0 k1  k0  2.0 2.0 2.0 2.0 2.0 2.0
4   1.0 1.0 1.0 1.0 1.0 1.0 k0  k1  2.0 2.0 2.0 2.0 2.0 2.0
5   1.0 1.0 1.0 1.0 1.0 1.0 k0  k1  2.0 2.0 2.0 2.0 2.0 2.0
6   1.0 1.0 1.0 1.0 1.0 1.0 k0  k1  2.0 2.0 2.0 2.0 2.0 2.0
  • merge的合并方式是outer 并显示出merge的方式
pd.merge(left=df_left,right=df_right,on=["key1","key2"],how="outer",indicator=True)

a   b   c   d   e_x f_x key1    key2    e_y f_y g   h   j   k   _merge
0   1.0 1.0 1.0 1.0 1.0 1.0 k1  k0  2.0 2.0 2.0 2.0 2.0 2.0 both
1   1.0 1.0 1.0 1.0 1.0 1.0 k1  k0  2.0 2.0 2.0 2.0 2.0 2.0 both
2   1.0 1.0 1.0 1.0 1.0 1.0 k1  k0  2.0 2.0 2.0 2.0 2.0 2.0 both
3   1.0 1.0 1.0 1.0 1.0 1.0 k1  k0  2.0 2.0 2.0 2.0 2.0 2.0 both
4   1.0 1.0 1.0 1.0 1.0 1.0 k0  k0  NaN NaN NaN NaN NaN NaN left_only
5   1.0 1.0 1.0 1.0 1.0 1.0 k0  k1  2.0 2.0 2.0 2.0 2.0 2.0 both
6   1.0 1.0 1.0 1.0 1.0 1.0 k0  k1  2.0 2.0 2.0 2.0 2.0 2.0 both
7   1.0 1.0 1.0 1.0 1.0 1.0 k0  k1  2.0 2.0 2.0 2.0 2.0 2.0 both
8   1.0 1.0 1.0 1.0 1.0 1.0 k1  k1  NaN NaN NaN NaN NaN NaN left_only
  • 使用left的方式进行合并 并指定索引位进行合并
pd.merge(left=df_left,right=df_right,on=["key1","key2"],how="left",left_index=True,right_index=True,indicator=True)

a   b   c   d   e_x f_x key1    key2    e_y f_y g   h   j   k   _merge
k1  1.0 1.0 1.0 1.0 1.0 1.0 k1  k0  NaN NaN NaN NaN NaN NaN left_only
k2  1.0 1.0 1.0 1.0 1.0 1.0 k0  k0  NaN NaN NaN NaN NaN NaN left_only
k3  1.0 1.0 1.0 1.0 1.0 1.0 k0  k1  2.0 2.0 2.0 2.0 2.0 2.0 both
k4  1.0 1.0 1.0 1.0 1.0 1.0 k1  k1  2.0 2.0 2.0 2.0 2.0 2.0 both
k5  1.0 1.0 1.0 1.0 1.0 1.0 k1  k0  2.0 2.0 2.0 2.0 2.0 2.0 both
  • 使用right的方式进行合并 并指定索引位进行合并 且对数据追加后缀
pd.merge(left=df_left,right=df_right,on=["key1","key2"],how="right",left_index=True,right_index=True,indicator=True,suffixes=("_left","_right"))

a   b   c   d   e_left  f_left  key1    key2    e_right f_right g   h   j   k   _merge
k3  1.0 1.0 1.0 1.0 1.0 1.0 k0  k1  2.0 2.0 2.0 2.0 2.0 2.0 both
k4  1.0 1.0 1.0 1.0 1.0 1.0 k1  k1  2.0 2.0 2.0 2.0 2.0 2.0 both
k5  1.0 1.0 1.0 1.0 1.0 1.0 k1  k0  2.0 2.0 2.0 2.0 2.0 2.0 both
k6  NaN NaN NaN NaN NaN NaN NaN NaN 2.0 2.0 2.0 2.0 2.0 2.0 right_only
k7  NaN NaN NaN NaN NaN NaN NaN NaN 2.0 2.0 2.0 2.0 2.0 2.0 right_only
  • 2
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值