了解一些数据集联接

Where ever you go, data is every where. Look at your car dashboard. It is a dashboard of data. It tells you your speed, odometer and gas in a few simple visuals. Go to a cash register, and it tells you how much you have to pay. Go to work and you may be confronted with spreadsheets. There are also folks that want to back up a bias they already have (sigh, yes that happens). In any case, data is everywhere and to appreciate and work with them, you need to understand joins. If you are like me and have worked in this industry, the below may sound familiar to you. :)

w ^这里您曾经去,数据是每一个地方。 查看您的汽车仪表板。 它是数据的仪表板。 它以一些简单的视觉效果告诉您您的速度,里程表和汽油。 转到收银机,它会告诉您您需要支付多少。 去上班,您可能会遇到电子表格。 也有一些人希望支持他们已经拥有的偏见(叹气,是的,确实如此)。 无论如何,数据无处不在,要欣赏和使用它们,您需要了解联接。 如果您像我一样并且在这个行业工作过,那么以下内容可能对您来说很熟悉。 :)

Person A is asking if dataset can be joined to B, nothing seems to work. Which join should be used?
Images by author, free stock images creative commons
作者提供的图片,免费图片共享空间

Data comes in various sources, and to combine it into something useful requires clear understanding of the various “Joins” between data sets.

数据来自各种来源,要将其组合成有用的东西,需要清楚地了解数据集之间的各种“联接”。

For simplicity sake, joins are can be thought of as a collection of common and different information between data sets.

为了简单起见,可以将联接视为数据集之间公共和不同信息的集合。

A “connection” is the common key between the data sets. This connection is required among a vast majority of data sets in order to obtain anything insightful between them. I hope the below helps to explain this concept a bit further. The “Name” column between these two data sets acts as a connection between the two.

“连接”是数据集之间的通用键。 绝大多数数据集之间都需要这种连接,以便获得它们之间有洞察力的任何信息。 我希望以下内容有助于进一步解释这个概念。 这两个数据集之间的“名称”列充当两者之间的连接。

Image for post
image by author
图片作者

I won’t get into the technicalities here, but connecting data sets essentially requires you to understand joins and connections. There are a few things you need to worry about when it comes to connecting data sets — but I’ll write about this in another article.

我不会在这里讨论技术问题,但是连接数据集本质上要求您了解联接和连接。 在连接数据集时,您需要担心一些事情,但是我将在另一篇文章中对此进行介绍。

let’s get into various join examples.

让我们进入各种连接示例。

I have two data sets — one, name and job title of employees in 2007 and another a list of employees in 2020. Each of these have their own rows.

我有两个数据集-一个是2007年员工的姓名和职务,另一个是2020年员工的列表。每个数据都有自己的行。

Image for post
image by author
图片作者

I want to answer to answer 3 questions based on the data sets.

我想根据数据集回答3个问题。

  1. Who are still working in this company and what jobs are they doing?

    谁还在这家公司工作,他们在做什么工作?

  2. Who are no longer in the company?

    谁不再在公司了?

  3. Who are new to the company?

    谁是公司的新手?

谁仍在我这家公司工作,他们在做什么工作? (Who are still my working in this company and what jobs are they doing?)

Inner join can answer this question. Inner join takes the employees in 2007 and combines it with 2020 using their employee name. The connection here is the “Name” column.

内部联接可以回答这个问题。 内部加入将2007年的员工纳入考虑范围,并使用员工姓名与2020年合并。 这里的连接是“名称”列。

Here is a picture of inner join.

这是内部联接的图片。

Image for post
image by author
图片作者

The common rows between them in this case are Peter, Jennifer and Alan.

在这种情况下,它们之间的共同行是Peter,Jennifer和Alan。

Peter was a manager in 2007, now he is a CEO. Jennifer was a team lead, now she is a divisional lead. Alan is now a intern in a different sector.

彼得于2007年担任经理,现在担任首席执行官。 詹妮弗(Jennifer)是团队领导,现在她是部门领导。 艾伦现在是另一部门的实习生。

谁不再在公司了? (Who are no longer in the company?)

This is an example of left anti join. The data set is on the left compared to the right. This is why it is called a “left” anti join. We want to know the employees that do not appear in 2020. Here is an example below.

这是左反连接的示例。 与右侧相比,数据集在左侧。 这就是为什么它被称为“左”反联接。 我们想知道2020年没有出现的员工。这是下面的示例。

Image for post
image by author
图片作者

See those X’s? We can’t find Tom or Lisa in the 2020 list. Looks like Tom and Lisa are no longer in with company.

看到那些X吗? 我们在2020年名单中找不到汤姆或丽莎。 汤姆(Tom)和丽莎(Lisa)似乎不再与公司在一起。

谁是公司的新手? (Who are new to the company?)

This is an example of a right anti join. It is a “Right” anti join because the data set we are interested in is on the right hand side. It contains Sarah and Abraham who did not work for the company in 2007. Looks like they are new to the company.

这是正确的反连接的示例。 这是一个“右”反连接,因为我们感兴趣的数据集在右侧。 其中包含Sarah和Abraham,他们在2007年都没有在公司工作。看起来他们对公司来说是新来的。

Image for post

Simple enough right? ☺

很简单吧? ☺

There is also the left join — it is the example of the left anti and inner join combined together.

还有左连接-这是左反连接和内部连接结合在一起的示例。

How about the union/append/binding type joins? they are just data sets stacked one on top of another.

联合/追加/绑定类型联接怎么样? 它们只是一堆又一堆的数据集。

In summary,

综上所述,

Inner Join — Common rows between data sets

内部联接 -数据集之间的公共行

Anti Join — Row difference between data sets

反联接 -数据集之间的行差异

Left Join — Common and row difference between data sets together in one view includes only one side

左联接 -在一个视图中,数据集之间的公共差异和行差异仅包括一侧

Full Join — Common and row difference between data sets together in one view includes both left and right hand side

完全连接 -在一个视图中,数据集之间的公共差异和行差异包括左侧和右侧

Union Join — data sets stacked one on top of another.

联合联接 —数据集一个接一个地堆叠。

TL:DR ? I put this together for you.

TL:DR吗? 我把这些放在一起给你。

Image for post
image by author
图片作者

In the next article, we will examine some simple ways of doing these joins on Power BI.

在下一篇文章中,我们将研究在Power BI上进行这些连接的一些简单方法。

Hope you like this article!

希望你喜欢这篇文章!

翻译自: https://medium.com/@The_Data_Kitchen/get-to-know-some-data-set-joins-42e2aa6f5785

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值