pd merge和df merge有什么区别

In my voluntary role providing online technical support for www.dataquest.io, I come across numerous questions that allow me to dive deeper into interesting questions I usually skim through.

在我为www.dataquest.io提供在线技术支持的自愿角色中,我遇到了许多问题,这些问题使我能够更深入地研究通常平时浏览的有趣问题。

Today, the question is: What’s the difference between left_df.merge(right_df) vs pd.merge(left_df, right_df)?

今天的问题是: left_df.merge(right_df)与pd.merge(left_df,right_df)有什么区别?

The short answer is left_df.merge() calls pd.merge() .

简短的答案是left_df.merge()调用pd.merge()

The former is used because it allows method chaining, analogous to the %>% pipe operator in R which allows you to write and read data processing code from left to right, such as left_df.merge(right_df).merge(right_df2). If you had to do pd.merge(), this is not the chaining style but wrapping style which ends up with an ugly pd.merge(pd.merge(left_df,right_df),right_df2) if you see where this is going.

使用前者是因为它允许方法链接,类似于R中的%>%管道运算符,后者允许您从左到右编写和读取数据处理代码,例如left_df.merge(right_df).merge(right_df2) 。 如果必须执行pd.merge(),那么这不是链接样式,而是包装样式,如果您看到此情况pd.merge(pd.merge(left_df,right_df),right_df2)以丑陋的pd.merge(pd.merge(left_df,right_df),right_df2)

Now let’s go down the rabbit hole to see what’s going on.

现在,让我们沿着兔子洞走下去,看看发生了什么。

First, when you see pd.merge, it actually, means pandas.merge, which means you have done import pandas. When youimport something, the __init__.py file of that module name (pandas in this question) is run.

首先,当您看到pd.merge ,它实际上意味着pandas.merge ,这意味着您已经完成了import pandasimport某些内容时,将运行该模块名称的__init__.py文件(此问题中的pandas )。

The main purpose of all these __init__.py files is to organize the API, and to allow the user to type a shorter import code by importing the middle packages for you, so you can write pandas.merge() once you import pandas rather than requiring from pandas.core.reshape.merge import merge first before you use the merge() function.

所有这些__init__.py文件的主要目的是组织API,并允许用户通过为您导入中间包来输入较短的导入代码,因此,一旦import pandas即可编写pandas.merge()而不是在使用merge()函数之前,首先需要from pandas.core.reshape.merge import merge

Now let's see what I mean by “import the middle packages for you” If you open https://github.com/pandas-dev/pandas/blob/v0.25.1/pandas/__init__.py#L129-L143, you will see how it imports many things, one line of which is from pandas.core.reshape.api (Figure 1), in that block merge is imported.

现在,让我们看看“为您导入中间包”的含义是什么:如果打开https://github.com/pandas-dev/pandas/blob/v0.25.1/pandas/__init__.py#L129-L143 ,您将看看它是如何导入许多东西的,其中一行from pandas.core.reshape.api (图1),在该块merge是导入的。

Image for post
Figure 1
图1

This is what allows you to callpd.merge directly, but let's get to the bottom of this.Going into pandas.core.reshape.api https://github.com/pandas-dev/pandas/blob/v0.25.1/pandas/core/reshape/api.py you see from pandas.core.reshape.merge import merge.(Figure 2)

这是允许您直接调用pd.merge ,但让我们深入到此。进入pandas.core.reshape.api https://github.com/pandas-dev/pandas/blob/v0.25.1/ from pandas.core.reshape.merge import merge看到的pandas / core / reshape / api.py (图2)

Image for post
Figure 2
图2

Now you see where the previous merge of from pandas.core.reshape.api came from.

现在,您将看到from pandas.core.reshape.api的先前merge from pandas.core.reshape.api

Finally let’s get to the source, going into pandas.core.reshape.merge at https://github.com/pandas-dev/pandas/blob/v0.25.1/pandas/core/reshape/merge.py#L53 you see def merge.(Figure 4)

最后,让我们进入源代码,进入https://github.com/pandas-dev/pandas/blob/v0.25.1/pandas/core/reshape/merge.py#L53进入pandas.core.reshape.merge def merge 。(图4)

Image for post
Figure 4
图4

Now let's see what the chaining style of coding, left_df.merge is doing, from https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html, click source to go https://github.com/pandas-dev/pandas/blob/v0.25.1/pandas/core/frame.py#L7304-L7335 to see def merge(self (Figure 5), the self which tells you this is a class (DataFrame in this case) method, which later imports from pandas.core.reshape.merge import merge, and passes all your parameters back into the merge from pandas.core.reshape.merge, with the only difference being it now automatically passes self as the left parameter for you.

现在,从https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html ,单击source转到httpsleft_df.merge的编码链接方式是什么。 ://github.com/pandas-dev/pandas/blob/v0.25.1/pandas/core/frame.py#L7304-L7335看一下def merge(self (图5),告诉您这是一个类的self (在这种情况下为DataFrame)方法,该方法随后from pandas.core.reshape.merge import merge ,然后将所有参数从pandas.core.reshape.merge传递回merge ,唯一的不同是它现在会自动传递self作为您的left参数。

Image for post
Figure 5
图5

You can compare the 2 function signatures of def merge from left_df.merge here and the earlier pd.merge discussion to see they are exactly the same merge.

你可以比较的2个函数签名def mergeleft_df.merge这里和前面的pd.merge讨论,看看他们是完全一样的merge

How did I even know to start searching from the keyword merge? Actually, I began the search from the source code of left_df.merge first, but I felt it’s better to explain the lowest level code first, then introduce the idea of self substituting left parameter so more complex ideas build on simpler ones.

我什至不知道如何从关键字merge开始搜索? 实际上,我首先是从left_df.merge的源代码开始搜索的,但是我觉得最好先解释最低级别的代码,然后再引入self替代left参数的想法,这样,更复杂的想法将基于更简单的想法。

I hope this article has inspired others to not fear the source, but encourage curiosity in how things really work beneath the hood, how the API is designed, and in so doing possibly contribute to Pandas in future.

我希望本文能激发其他人不要害怕消息来源,而是鼓励人们对事物的真正运作方式,API的设计方式以及未来这样做可能对熊猫做出贡献的好奇心。

翻译自: https://towardsdatascience.com/whats-the-difference-between-pd-merge-and-df-merge-ab387bc20a2e

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值