上一篇:【用于数据分析的 SQL(五)】【SQL 数据清理】
一、 前言
窗口功能将一行和另一行进行比较,无需执行任何加入。
二、 窗口函数
1.在没有日期截断的订单时间内创建(在表中)的运行总数。最终表应有两个列:一个列,每个新行添加的金额,另一个与运行总数。
SELECT standard_amt_usd,
SUM(standard_amt_usd) OVER (ORDER BY occurred_at) AS running_total
FROM orders
2.请修改上一个测验中的查询,以包括分区。在订单时间内仍然创建运行总数(在表中),但这次,日期按年截断,并按同年截断的变量进行分区。最终表应有三个列:一个列,每个行添加的金额,一个用于截断日期,最后一列包含每年内的运行总数。
SELECT standard_amt_usd,
DATE_TRUNC('year', occurred_at) as year,
SUM(standard_amt_usd) OVER (PARTITION BY DATE_TRUNC('year', occurred_at) ORDER BY occurred_at) AS running_total
FROM orders
三、 ROW_NUMBER和RANK()
从表中选择"和"变量",然后创建一个名为"使用分区"对每个帐户订购的纸张总量(从高到低)进行排名的列。最终表应具有这四个列。id account_id total orders total_rank
SELECT id,
account_id,
total,
RANK() OVER (PARTITION BY account_id ORDER BY total DESC) AS total_rank
FROM orders
四、多个窗口功能的别名
SELECT id,
account_id,
DATE_TRUNC('year',occurred_at) AS year,
DENSE_RANK() OVER (PARTITION BY account_id ORDER BY DATE_TRUNC('year',occurred_at)) AS dense_rank,
total_amt_usd,
SUM(total_amt_usd) OVER (PARTITION BY account_id ORDER BY DATE_TRUNC('year',occurred_at)) AS sum_total_amt_usd,
COUNT(total_amt_usd) OVER (PARTITION BY account_id ORDER BY DATE_TRUNC('year',occurred_at)) AS count_total_amt_usd,
AVG(total_amt_usd) OVER (PARTITION BY account_id ORDER BY DATE_TRUNC('year',occurred_at)) AS avg_total_amt_usd,
MIN(total_amt_usd) OVER (PARTITION BY account_id ORDER BY DATE_TRUNC('year',occurred_at)) AS min_total_amt_usd,
MAX(total_amt_usd) OVER (PARTITION BY account_id ORDER BY DATE_TRUNC('year',occurred_at)) AS max_total_amt_usd
FROM orders
添加别名
SELECT id,
account_id,
DATE_TRUNC('year',occurred_at) AS year,
DENSE_RANK() OVER account_year_window AS dense_rank,
total_amt_usd,
SUM(total_amt_usd) OVER account_year_window AS sum_total_amt_usd,
COUNT(total_amt_usd) OVER account_year_window AS count_total_amt_usd,
AVG(total_amt_usd) OVER account_year_window AS avg_total_amt_usd,
MIN(total_amt_usd) OVER account_year_window AS min_total_amt_usd,
MAX(total_amt_usd) OVER account_year_window AS max_total_amt_usd
FROM orders
WINDOW account_year_window AS (PARTITION BY account_id ORDER BY DATE_TRUNC('year',occurred_at))
五、 百分位数
使用该功能将帐户按订单金额分为 4 个级别。您生成的表格应具有每个订单的时间、购买的纸张总量以及列中的四个级别之一。
SELECT
account_id,
occurred_at,
standard_qty,
NTILE(4) OVER (PARTITION BY account_id ORDER BY standard_qty) AS standard_quartile
FROM orders
ORDER BY account_id DESC