abp 通用sql_SQL最大功能之一的行通用性

abp 通用sql

论数据工程 (On Data Engineering)

ROW_NUMBER is one of the most valuable and versatile functions in SQL. It can be leveraged for different use cases, from ranking items, identifying data quality gaps, doing some minimization, handling preference queries, or helping with sessionization etc.

ROW_NUMBER是SQL中最有价值和最通用的功能之一。 它可以用于不同的用例,例如对项目进行排名,识别数据质量差距,进行最小化,处理首选项查询或帮助进行会话化等。

The ROW_NUMBER function isn’t, however, a traditional function. It is a window function. Window functions are an advanced kind of function, with specific properties. This article aims to go over how window functions, and more specifically, how the ROW_NUMBERfunction work, and to go over some of the use cases for the ROW_NUMBER function.

ROW_NUMBER函数不是传统函数。 这是一个窗口功能。 窗口函数是具有特定属性的高级函数。 本文的目的是检查如何窗口的功能,更具体地说,是怎么ROW_NUMBER功能工作,去了一些使用案例ROW_NUMBER功能。

窗函数剖析 (Anatomy of a Window function)

To understand how a window function work, it is essential first to understand, what type of arguments it can take. The following is the syntax for providing an argument using the window function.

要了解窗口函数的工作原理,首先必须了解其可以采用的参数类型。 以下是使用window函数提供参数的语法。

ROW_NUMBER() OVER (window)

ROW_NUMBER()个(窗口)

As you can see, the row number doesn’t take a direct argument. We don’t have a ROW_NUMBER(a.columna) , for instance, but takes arguments in the OVER clause. The argument it takes is called a window. The window defines a subset of the dataset to use for the computation. Different arguments can be used to define this window, partitions, orders, rows between.

如您所见,行号没有直接的参数。 例如,我们没有ROW_NUMBER(a.columna) ,但是在OVER子句中使用了参数。 它采用的参数称为窗口。 该窗口定义了要用于计算的数据集的子集。 可以使用不同的参数来定义此窗口,分区,顺序,之间的行。

Other window functions may also include direct arguments like traditional functions, such as the SUM window function, e.g.: SUM(amount) OVER (window) , in which case we would be summing the amount over a subset of the data as defined by the window.

其他窗口函数也可能包括直接参数,例如传统函数,例如SUM窗口函数,例如: SUM(amount) OVER (window) ,在这种情况下,我们将对窗口定义的数据子集的数量求和。

PARTITION BY

分区依据

The PARTITION BY argument allows us to split the dataset. The typical way to uses it is to specify the list of columns on which we would like to start a new count on:

PARTITION BY参数允许我们分割数据集。 典型的使用方式是指定要在其上开始新计数的列的列表:

SELECT
a.client_id,
ROW_NUMBER() OVER(PARTITION BY a.country, a.city) AS rn_city

The above statement would, for instance, gives us, for each client, a row number from 1 to n (number of client in the city).

例如,上面的语句为每个客户提供了从1到n(城市中客户的数量)的行号。

The partition by clause can, however, accept more complicated expressions. This is better shown using a SUM window function rather than a ROW_NUMBER function.

但是,partition by子句可以接受更复杂的表达式。 使用SUM窗口函数而不是ROW_NUMBER函数可以更好地显示这一点。

-- SQLite Syntax
WITH
x AS (
SELECT 1 AS t
UNION ALL
SELECT 2 AS t
UNION ALL
SELECT 2 AS t
UNION ALL
SELECT 3 AS t)SELECT
t, SUM(t) OVER (PARTITION BY CASE WHEN t = 1 THEN t ELSE NULL END ) AS s
FROM x;

The result of the query is the following:

查询结果如下:

t|s|
2|7
2|7
3|7
1|1

What the query does is handling the SUM with a partition set for t=1, and another for the rest of the query (NULL). The split between the dataset happens after the evaluation from the case statement query.

该查询所执行的操作是处理SUM,并将其分区设置为t = 1,其余分区设置为查询的其余部分(NULL)。 数据集之间的拆分发生在案例语句查询的评估之后。

Since for t=1, we have one partition equal to t (which can have only one value), we have a first group (sub-dataset) to do computation on. For t != 1, there is only one option is to be part of the group with an output as a NULLvalue. Since this group is composed of 2 records with t=2 and one record with t=3, the sum for the group is equal to 7.

因为对于t = 1,我们有一个等于t的分区(只能有一个值),所以我们有第一组(子数据集)进行计算。 对于t!= 1,只有一个选项是成为组的一部分,并且输出为NULL值。 由于该组由t = 2的2条记录和t = 3的1条记录组成,因此该组的总和等于7。

If we replaced the window function with the following:

如果我们将窗口函数替换为以下内容:

PARTITION BY CASE WHEN t <= 2 THEN ELSE null END

We would generate three groups to split the data into t=1, t=2, and t>2. The respective sums would be 1,4 and 3.

我们将生成三组以将数据分为t = 1,t = 2和t> 2。 分别为1,4和3。

The same type of operations can also be performed to compute the row numbers.

还可以执行相同类型的操作来计算行号。

订购 (ORDER BY)

The order by argument will define, for the purpose of this specific function, how the dataset will be sorted. A simple ROW_NUMBER query such as the following will only be providing a sorted dataset by value with the associate row_number as if it was a full dataset:

为了实现此特定功能,按参数排序将定义如何对数据集进行排序。 像下面这样的简单ROW_NUMBER查询将仅提供按值排序的数据集以及相关联的row_number,就好像它是完整数据集一样:

SELECT
*,
ROW_NUMBER() OVE
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值