partition 子句_SQL PARTITION BY子句概述

最新推荐文章于 2024-07-24 22:36:05 发布

culuo4781

最新推荐文章于 2024-07-24 22:36:05 发布

阅读量4.4k

点赞数 1

文章标签：数据库 java mysql sql 大数据

原文链接：https://www.sqlshack.com/sql-partition-by-clause-overview/

版权

partition 子句

This article will cover the SQL PARTITION BY clause and, in particular, the difference with GROUP BY in a select statement. We will also explore various use case of SQL PARTITION BY.

本文将介绍SQL PARTITION BY子句，尤其是select语句中与GROUP BY的区别。我们还将探讨SQL PARTITION BY的各种用例。

We use SQL PARTITION BY to divide the result set into partitions and perform computation on each subset of partitioned data.

我们使用SQL PARTITION BY将结果集划分为多个分区，并对分区数据的每个子集执行计算。

准备样品数据 (Preparing Sample Data )

Let us create an Orders table in my sample database SQLShackDemo and insert records to write further queries.

让我们在示例数据库SQLShackDemo中创建一个Orders表，并插入记录以编写进一步的查询。

Use SQLShackDemo
Go
CREATE TABLE [dbo].[Orders]
(
    [orderid] INT,
    [Orderdate] DATE,
    [CustomerName] VARCHAR(100),
    [Customercity] VARCHAR(100), 
    [Orderamount] MONEY
)

I use ApexSQL Generate to insert sample data into this article. Right click on the Orders table and Generate test data.

我使用ApexSQL Generate将示例数据插入本文。右键单击“订单”表并生成测试数据 。

Generate Text data using ApexSQL Generate

It launches the ApexSQL Generate. I generated a script to insert data into the Orders table. Execute this script to insert 100 records in the Orders table.

它启动ApexSQL生成。我生成了一个脚本，用于将数据插入到Orders表中。执行此脚本以在Orders表中插入100条记录。

USE [SQLShackDemo]
GO
INSERT [dbo].[Orders]  VALUES (216090, CAST(N'1826-12-19' AS Date), N'Edward', N'Phoenix', 4713.8900)
GO
INSERT [dbo].[Orders]  VALUES (508220, CAST(N'1826-12-09' AS Date), N'Aria', N'San Francisco', 9832.7200)
GO
…

Once we execute insert statements, we can see the data in the Orders table in the following image.

执行插入语句后，我们可以在下图中的Orders表中看到数据。

We use SQL GROUP BY clause to group results by specified column and use aggregate functions such as Avg(), Min(), Max() to calculate required values.

我们使用SQL GROUP BY子句按指定的列对结果进行分组，并使用诸如Avg（），Min（），Max（）之类的聚合函数来计算所需的值。

按功能分组 (Group By function syntax)

SELECT expression, aggregate function ()
FROM tables
WHERE conditions
GROUP BY expression

Suppose we want to find the following values in the Orders table

假设我们要在“订单”表中找到以下值

Minimum order value in a city
一个城市的最小订单价值
Maximum order value in a city
城市中的最大订单价值
Average order value in a city
一个城市的平均订单价值

Execute the following query with GROUP BY clause to calculate these values.

使用GROUP BY子句执行以下查询以计算这些值。

SELECT Customercity, 
       AVG(Orderamount) AS AvgOrderAmount, 
       MIN(OrderAmount) AS MinOrderAmount, 
       SUM(Orderamount) TotalOrderAmount
FROM [dbo].[Orders]
GROUP BY Customercity;

In the following screenshot, we can see Average, Minimum and maximum values grouped by CustomerCity.

在以下屏幕截图中，我们可以看到按CustomerCity分组的平均值，最小值和最大值。

Now, we want to add CustomerName and OrderAmount column as well in the output. Let’s add these columns in the select statement and execute the following code.

现在，我们要在输出中也添加CustomerName和OrderAmount列。让我们将这些列添加到select语句中，并执行以下代码。

SELECT Customercity, CustomerName ,OrderAmount,
       AVG(Orderamount) AS AvgOrderAmount, 
       MIN(OrderAmount) AS MinOrderAmount, 
       SUM(Orderamount) TotalOrderAmount
FROM [dbo].[Orders]
GROUP BY Customercity;

Once we execute this query, we get an error message. In the SQL GROUP BY clause, we can use a column in the select statement if it is used in Group by clause as well. It does not allow any column in the select clause that is not part of GROUP BY clause.

一旦执行此查询，我们将收到一条错误消息。在SQL GROUP BY子句中，如果同时在Group by子句中使用它，则可以在select语句中使用一列。它不允许select子句中的任何列不属于GROUP BY子句。

We can use the SQL PARTITION BY clause to resolve this issue. Let us explore it further in the next section.

我们可以使用SQL PARTITION BY子句解决此问题。让我们在下一部分中进一步探讨它。

SQL分区依据 (SQL PARTITION BY)

We can use the SQL PARTITION BY clause with the OVER clause to specify the column on which we need to perform aggregation. In the previous example, we used Group By with CustomerCity column and calculated average, minimum and maximum values.

我们可以将SQL PARTITION BY子句与OVER子句一起使用，以指定需要对其进行聚合的列。在上一个示例中，我们将“分组依据”与“ CustomerCity”一起使用，并计算了平均值，最小值和最大值。

Let us rerun this scenario with the SQL PARTITION BY clause using the following query.

让我们使用以下查询，使用SQL PARTITION BY子句重新运行此方案。

SELECT Customercity, 
       AVG(Orderamount) OVER(PARTITION BY Customercity) AS AvgOrderAmount, 
       MIN(OrderAmount) OVER(PARTITION BY Customercity) AS MinOrderAmount, 
       SUM(Orderamount) OVER(PARTITION BY Customercity) TotalOrderAmount
FROM [dbo].[Orders];

In the output, we get aggregated values similar to a GROUP By clause. You might notice a difference in output of the SQL PARTITION BY and GROUP BY clause output.

在输出中，我们获得类似于GROUP BY子句的聚合值。您可能会注意到SQL PARTITION BY和GROUP BY子句输出的输出有所不同。

Group By

SQL PARTITION BY

We get a limited number of records using the Group By clause

We get all records in a table using the PARTITION BY clause.

It gives one row per group in result set. For example, we get a result for each group of CustomerCity in the GROUP BY clause.

It gives aggregated columns with each record in the specified table.

We have 15 records in the Orders table. In the query output of SQL PARTITION BY, we also get 15 rows along with Min, Max and average values.

通过...分组

SQL分区依据

我们使用Group By子句获得的记录数量有限

我们使用PARTITION BY子句获取表中的所有记录。

它在结果集中为每组一行。例如，我们在GROUP BY子句中为CustomerCity的每个组获取结果。

它为指定表中的每条记录提供汇总列。

我们在订单表中有15条记录。在SQL PARTITION BY的查询输出中，我们还获得15行以及Min，Max和平均值。

In the previous example, we get an error message if we try to add a column that is not a part of the GROUP BY clause.

在上一个示例中，如果尝试添加不属于GROUP BY子句的列，则会收到一条错误消息。

We can add required columns in a select statement with the SQL PARTITION BY clause. Let us add CustomerName and OrderAmount columns and execute the following query.

我们可以使用SQL PARTITION BY子句在select语句中添加必需的列。让我们添加CustomerName和OrderAmount列并执行以下查询。

SELECT Customercity, 
       CustomerName, 
       OrderAmount, 
       AVG(Orderamount) OVER(PARTITION BY Customercity) AS AvgOrderAmount, 
       MIN(OrderAmount) OVER(PARTITION BY Customercity) AS MinOrderAmount, 
       SUM(Orderamount) OVER(PARTITION BY Customercity) TotalOrderAmount
FROM [dbo].[Orders];

We get CustomerName and OrderAmount column along with the output of the aggregated function. We also get all rows available in the Orders table.

我们获得CustomerName和OrderAmount列以及聚合函数的输出。我们还将在Orders表中获得所有可用行。

In the following screenshot, you can for CustomerCity Chicago, it performs aggregations (Avg, Min and Max) and gives values in respective columns.

在以下屏幕截图中，您可以为CustomerCity Chicago进行聚合（平均，最小和最大）并在相应的列中提供值。

Similarly, we can use other aggregate functions such as count to find out total no of orders in a particular city with the SQL PARTITION BY clause.

同样，我们可以使用其他聚合函数（例如count）来通过SQL PARTITION BY子句找出特定城市的订单总数。

SELECT Customercity, 
       CustomerName, 
       OrderAmount, 
       COUNT(OrderID) OVER(PARTITION BY Customercity) AS CountOfOrders, 
       AVG(Orderamount) OVER(PARTITION BY Customercity) AS AvgOrderAmount, 
       MIN(OrderAmount) OVER(PARTITION BY Customercity) AS MinOrderAmount, 
       SUM(Orderamount) OVER(PARTITION BY Customercity) TotalOrderAmount
FROM [dbo].[Orders];

We can see order counts for a particular city. For example, we have two orders from Austin city therefore; it shows value 2 in CountofOrders column.

我们可以看到特定城市的订单计数。例如，因此，我们有两个来自奥斯丁市的订单；它在CountofOrders列中显示值2。

带有ROW_NUMBER（）的PARTITION BY子句 (PARTITION BY clause with ROW_NUMBER())

We can use the SQL PARTITION BY clause with ROW_NUMBER() function to have a row number of each row. We define the following parameters to use ROW_NUMBER with the SQL PARTITION BY clause.

我们可以将SQL PARTITION BY子句与ROW_NUMBER（）函数一起使用，以获取每行的行号。我们定义以下参数以将ROW_NUMBER与SQL PARTITION BY子句一起使用。

PARTITION BY column – In this example, we want to partition data on PARTITION BY列 –在此示例中，我们要对CustomerCity column CustomerCity列上的数据进行分区
Order By: In the ORDER BY column, we define a column or condition that defines row number. In this example, we want to sort data on the OrderAmount column
排序依据：在ORDER BY列中，我们定义一列或条件来定义行号。在此示例中，我们要对OrderAmount列上的数据进行排序

SELECT Customercity, 
       CustomerName, 
       ROW_NUMBER() OVER(PARTITION BY Customercity
       ORDER BY OrderAmount DESC) AS "Row Number", 
       OrderAmount, 
       COUNT(OrderID) OVER(PARTITION BY Customercity) AS CountOfOrders, 
       AVG(Orderamount) OVER(PARTITION BY Customercity) AS AvgOrderAmount, 
       MIN(OrderAmount) OVER(PARTITION BY Customercity) AS MinOrderAmount, 
       SUM(Orderamount) OVER(PARTITION BY Customercity) TotalOrderAmount
FROM [dbo].[Orders];

In the following screenshot, we get see for CustomerCity Chicago, we have Row number 1 for order with highest amount 7577.90. it provides row number with descending OrderAmount.

在以下屏幕截图中，我们看到了CustomerCity Chicago ，我们的订单行1为最高金额7577.90。它为行号提供降序的OrderAmount。

具有累积总值的PARTITION BY子句 (PARTITION BY clause with Cumulative total value)

Suppose we want to get a cumulative total for the orders in a partition. Cumulative total should be of the current row and the following row in the partition.

假设我们要获得分区中订单的累计总数。累积总数应该是分区中当前行和下一行的总和。

For example, in the Chicago city, we have four orders.

例如，在芝加哥市，我们有四个订单。

CustomerCity	CustomerName	Rank	OrderAmount	Cumulative Total Rows	Cumulative Total
Chicago	Marvin	1	7577.9	Rank 1 +2	14777.51
Chicago	Lawrence	2	7199.61	Rank 2+3	14047.21
Chicago	Alex	3	6847.66	Rank 3+4	8691.49
Chicago	Jerome	4	1843.83	Rank 4	1843.83

客户城市	顾客姓名	秩	订单金额	累积总行	累计总数
芝加哥	Maven	1个	7577.9	等级1 +2	14777.51
芝加哥	劳伦斯	2	7199.61	等级2 + 3	14047.21
芝加哥	亚历克斯	3	6847.66	等级3 + 4	8691.49
芝加哥	杰罗姆	4	1843.83	等级4	1843.83

In the following query, we the specified ROWS clause to select the current row (using CURRENT ROW) and next row (using 1 FOLLOWING). It further calculates sum on those rows using sum(Orderamount) with a partition on CustomerCity ( using OVER(PARTITION BY Customercity ORDER BY OrderAmount DESC).

在下面的查询中，我们指定了ROWS子句以选择当前行（使用CURRENT ROW）和下一行（使用1 FOLLOWING）。它还使用sum（Orderamount）和CustomerCity上的分区（使用OVER（PARTITION BY Customercity或ORDER BY OrderAmount DESC））来计算这些行上的总和。

SELECT Customercity, 
       CustomerName, 
       OrderAmount, 
       ROW_NUMBER() OVER(PARTITION BY Customercity
       ORDER BY OrderAmount DESC) AS "Row Number", 
       CONVERT(VARCHAR(20), SUM(orderamount) OVER(PARTITION BY Customercity
       ORDER BY OrderAmount DESC ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING), 1) AS CumulativeTotal,

Similarly, we can calculate the cumulative average using the following query with the SQL PARTITION BY clause.

同样，我们可以使用带有SQL PARTITION BY子句的以下查询来计算累积平均值。

SELECT Customercity, 
       CustomerName, 
       OrderAmount, 
       ROW_NUMBER() OVER(PARTITION BY Customercity
       ORDER BY OrderAmount DESC) AS "Row Number", 
       CONVERT(VARCHAR(20), AVG(orderamount) OVER(PARTITION BY Customercity
       ORDER BY OrderAmount DESC ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING), 1) AS CumulativeAVG

用PARTITION BY子句限制行的行数 (ROWS UNBOUNDED PRECEDING with the PARTITION BY clause)

We can use ROWS UNBOUNDED PRECEDING with the SQL PARTITION BY clause to select a row in a partition before the current row and the highest value row after current row.

我们可以将ROWS UNBOUNDED PRECEDING与SQL PARTITION BY子句一起使用，以选择分区中当前行之前的行以及当前行之后的最大值行。

In the following table, we can see for row 1; it does not have any row with a high value in this partition. Therefore, Cumulative average value is the same as of row 1 OrderAmount.

在下表中，我们可以看到第1行；该分区中没有任何具有高值的行。因此，累积平均值与第1行OrderAmount相同。

For Row2, It looks for current row value (7199.61) and highest value row 1(7577.9). It calculates the average for these two amounts.

对于第2行，它将查找当前行值（7199.61）和最高值行1（7577.9）。它计算这两个数量的平均值。

For Row 3, it looks for current value (6847.66) and higher amount value than this value that is 7199.61 and 7577.90. It calculates the average of these and returns.

对于第3行，它将查找当前值（6847.66）和比该值更高的金额值7199.61和7577.90。它计算这些平均值并返回。

CustomerCity	CustomerName	Rank	OrderAmount	Cumulative Average Rows	Cumulative Average
Chicago	Marvin	1	7577.9	Rank 1	7577.90
Chicago	Lawrence	2	7199.61	Rank 1+2	7388.76
Chicago	Alex	3	6847.66	Rank 1+2+3	7208.39
Chicago	Jerome	4	1843.83	Rank 1+2+3+4	5867.25

客户城市	顾客姓名	秩	订单金额	累积平均行	累积平均值
芝加哥	Maven	1个	7577.9	等级1	7577.90
芝加哥	劳伦斯	2	7199.61	等级1 + 2	7388.76
芝加哥	亚历克斯	3	6847.66	等级1 + 2 + 3	7208.39
芝加哥	杰罗姆	4	1843.83	等级1 + 2 + 3 + 4	5867.25

Execute the following query to get this result with our sample data.

执行以下查询以通过我们的样本数据获得此结果。

SELECT Customercity, 
       CustomerName, 
       OrderAmount, 
       ROW_NUMBER() OVER(PARTITION BY Customercity
       ORDER BY OrderAmount DESC) AS "Row Number", 
       CONVERT(VARCHAR(20), AVG(orderamount) OVER(PARTITION BY Customercity
       ORDER BY OrderAmount DESC ROWS UNBOUNDED PRECEDING), 1) AS CumulativeAvg
FROM [dbo].[Orders];

结论 (Conclusion)

In this article, we explored the SQL PARTIION BY clause and its comparison with GROUP BY clause. We also learned its usage with a few examples. I hope you find this article useful and feel free to ask any questions in the comments below

在本文中，我们探讨了SQL PARTIION BY子句及其与GROUP BY子句的比较。我们还通过一些示例了解了它的用法。希望本文对您有所帮助，并随时在下面的评论中提问