sql 窗口函数_您应该知道的5个最有用的sql窗口函数

sql 窗口函数

介绍(Introduction)

In my last article, I explained what SQL window functions are and how they can be used. In this article, I’m going to share what I think are the 5 MOST useful window (analytics) functions.

在上一篇文章中,我解释了什么是SQL窗口函数以及如何使用它们。 在本文中,我将分享我认为的5个MOST有用的窗口(分析)功能。

I highly recommend that you try to understand these if you work with data in your job (i.e. data analysts, product analysts, data scientists, data engineers, etc…). They will certainly come in handy!

如果您在工作中使用数据(例如,数据分析师,产品分析师,数据科学家,数据工程师等),我强烈建议您尝试理解这些内容。 他们一定会派上用场!

With that said, let’s dive right into it.

话虽如此,让我们深入研究它。

To help you understand how these 5 window functions work, I’m going to use the following table as an example:

为了帮助您了解这5个窗口函数的工作方式,我将使用下表作为示例:

Image for post

1.行号 (1. Row Number)

What does it do? ROW_NUMBER simply returns the number of each row, starting at 1 for the first record.

它有什么作用? ROW_NUMBER只是返回每行的编号,第一条记录从1开始。

When is it Useful?Row number is extremely useful when you want to get the first or last record of a specific table. For example, if you have a table of customer purchases and you want to get the first purchase for each customer, you can PARTITION BY customer (name/id) and ORDER BY purchase date. Then you can filter the table WHERE row number = 1.

什么时候有用? 当您要获取特定表的第一条或最后一条记录时,行号非常有用。 例如,如果您有一个客户购买表,并且想获得每个客户的第一次购买,则可以按客户PARTITION BY客户(名称/ ID)和ORDER BY购买日期。 然后,您可以在WHERE行号= 1的表中进行过滤。

Example:Using my example above, let’s say I want to apply a row number to each row, split by gender and from the highest GPA to lowest. Then I could perform the following query to get the following result:

例: 使用上面的示例,假设我要对每行应用行号,并按性别划分,从最高GPA到最低GPA。 然后,我可以执行以下查询以获得以下结果:

SELECT *,
ROW_NUMBER() OVER (PARTITION BY Gender ORDER BY GPA DESC) as rn
FROM
students
Image for post

Notice that because ORDERed BY GPA desc, the female and male with the highest GPA have a row number = 1.

请注意,由于ORDER BY BY GPA desc,所以具有最高GPA的女性和男性的行号= 1。

2.总和/计数 (2. Sum/Count)

What does it do?SUM() and COUNT() are aggregate functions that either SUMs the values in a column or COUNTs the values in a column that are not null.

它有什么作用? SUM()和COUNT()是聚合函数,用于对列中的值求和或对列中不为空的值进行计数。

When is it Useful?By using SUM() or COUNT() in a window function, you can calculate cumulative sums or cumulative counts. This is extremely useful if you want to create graphs that show the growth of a specific metric (i.e. number of orders, revenue, etc…) over time.

什么时候有用? 通过在窗口函数中使用SUM()或COUNT(),您可以计算累积总和或累积计数。 如果您要创建图表来显示特定指标(即订单数,收入等)随时间的增长,这将非常有用。

Example:While this example doesn’t necessarily do justice, it will give you an understanding of how it works.

例: 尽管此示例不一定是正义的,但它将使您对其工作原理有所了解。

SELECT *,
SUM(GPA) OVER (PARTITION BY Gender ORDER BY GPA DESC) as GPA_sum
FROM
students
Image for post

Notice how the column, GPA_sum, calculates the cumulative GPA (GPA for the given row plus the GPA for all previous rows).

请注意GPA_sum列如何计算累积GPA(给定行的GPA加上之前所有行的GPA)。

3.最小/最大 (3. Min/Max)

What does it do?Just as you would use MIN()/MAX() in an aggregate function, you can also use it in a window function.

它有什么作用? 就像在聚合函数中使用MIN()/ MAX()一样,您也可以在window函数中使用它。

When is it Useful?Min/Max is useful if you want to compare each row with the highest/lowest value.

什么时候有用? 如果要将每行与最高/最低值进行比较,则最小值/最大值很有用。

Example:In the example below, we can use this to compare each student’s GPA with the highest GPA per gender. You can take this a step further and create a new column that calculates the difference between the two.

例: 在下面的示例中,我们可以使用它来比较每个学生的GPA和每个性别的GPA最高。 您可以更进一步,并创建一个新列来计算两者之间的差异。

SELECT *,
MAX(GPA) OVER (PARTITION BY Gender ORDER BY GPA DESC) as Max_GPA
Image for post

4.落后/领先 (4. Lag/Lead)

What does it do? / When is it Useful?LEAD() and LAG() are useful when you want to compare the values of previous rows or later rows.

它有什么作用? /什么时候有用? 当您想比较前几行或后几行的值时,LEAD()和LAG()很有用。

Example:Let’s say I wanted to compare each student’s GPA with the next student’s GPA that is just below his/hers. I can use the LAG() function to achieve this.

例: 假设我想将每个学生的GPA与下一个学生的GPA进行比较。 我可以使用LAG()函数来实现这一点。

SELECT *,
LAG(GPA) OVER (ORDER BY GPA ASC) as Lower_GPA
FROM
students
Image for post

Notice how the first row is empty because there’s no previous row for it to refer to.

注意第一行是空的,因为没有上一行可供参考。

5.密集等级 (5. Dense Rank)

What does it do?DENSE_RANK is similar to ROW_NUMBER except that it returns the same rank (number) for equal values.

它有什么作用? DENSE_RANK与ROW_NUMBER类似,但它对于相同的值返回相同的等级(数字)。

When is it Useful?DENSE_RANK is useful when you want to rank your data based on a particular variable or variables.

什么时候有用? 当您要基于一个或多个特定变量对数据进行排名时,DENSE_RANK很有用。

Example:If we wanted to rank the students from the lowest GPA to the highest, we could use the following code to achieve this.

例: 如果我们想对学生的GPA从最低到最高进行排名,可以使用以下代码来实现。

SELECT *,
DENSE_RANK() OVER(ORDER BY GPA ASC) as Dense_rank
Image for post

Notice that Martin and Jason have the same GPA, and so, they have the same Dense_rank equal to 4.

请注意,Martin和Jason的GPA相同,因此,它们的Dense_rank等于4。

谢谢阅读! (Thanks for Reading!)

By reading this article, you should not only know what a window function is, but you should also have a good grasp of its versatility and functionality. There are so many more ways that you can use window functions, but these five have definitely been the most useful in my career!

通过阅读本文,您不仅应该了解窗口函数是什么,而且还应该掌握其多功能性和功能。 您可以使用更多的功能来使用窗口功能,但是这五个绝对是我职业中最有用的!

特伦斯·辛 (Terence Shin)

翻译自: https://towardsdatascience.com/5-most-useful-window-functions-in-sql-you-should-know-35ea50ecd49e

sql 窗口函数

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值