SQL 子查询和临时表格

首个子查询解决方案

  1. 首先,我们需要按照日期和渠道分组。然后按事件数(第三列)排序,这样可以快速得出第一个问题的答案。
    SELECT DATE_TRUNC('day',occurred_at) AS day,
    channel, COUNT(*) as events
    FROM web_events
    GROUP BY 1,2
    ORDER BY 3 DESC;
    
  2. 可以看出,要获得这一结果,提供了整个原始表格。查询的附加部分包括 *,并且我们需要为表格设置别名。此外,是在 SELECT 语句中(而不是 FROM)中提供表格。
    SELECT *
    FROM (SELECT DATE_TRUNC('day',occurred_at) AS day,
    channel, COUNT(*) as events
    FROM web_events
    GROUP BY 1,2
    ORDER BY 3 DESC) sub;
    
  3. 最后,我们在以下语句中能够获得显示每个渠道一天的平均事件数的表格。
    SELECT channel, AVG(events) AS average_events
    FROM (SELECT DATE_TRUNC('day',occurred_at) AS day,
    channel, COUNT(*) as events
    FROM web_events
    GROUP BY 1,2) sub
    GROUP BY channel
    ORDER BY 2 DESC;

 

格式清晰的查询

与之前的示例相比,在这个格式清晰的示例中,我们很容易就看出要从哪个表格中获取数据。此外,如果在子查询后面有 GROUP BYORDER BYWHEREHAVING 或任何其他语句,则按照外部查询的同一级别缩进,正如最后一个示例所显示的,它是上个练习的最后一个解决方案。

SELECT *
FROM (SELECT DATE_TRUNC('day',occurred_at) AS day,
                channel, COUNT(*) as events
      FROM web_events 
      GROUP BY 1,2
      ORDER BY 3 DESC) sub;

下面的查询很相似,但是向外部查询应用了其他逻辑,因此按照外部查询的级别缩进。而内部查询逻辑的缩进级别与内部表格匹配。

SELECT *
FROM (SELECT DATE_TRUNC('day',occurred_at) AS day,
                channel, COUNT(*) as events
      FROM web_events 
      GROUP BY 1,2
      ORDER BY 3 DESC) sub
GROUP BY channel
ORDER BY 2 DESC;

最后两个查询容易读懂多了!

 

  1. 以下是从 orders 表格中获取第一个订单的年/月信息的查询。

    SELECT DATE_TRUNC('month', MIN(occurred_at))
    FROM orders;
    
  2. 然后,为了获取每个订单的平均值,我们可以在一个查询中执行所有的任务。但是为了便于阅读,我在下面提供了两个查询,单独执行每一步。

SELECT AVG(standard_qty) avg_std, AVG(gloss_qty) avg_gls, AVG(poster_qty) avg_pst
FROM orders
WHERE DATE_TRUNC('month', occurred_at) =
(SELECT DATE_TRUNC('month', MIN(occurred_at)) FROM orders);

SELECT SUM(total_amt_usd)
FROM orders
WHERE DATE_TRUNC('month', occurred_at) =
(SELECT DATE_TRUNC('month', MIN(occurred_at)) FROM orders);

 

  1. 提供每个区域拥有最高销售额 (total_amt_usd) 的销售代表姓名

    首先,我要算出与每个销售代表相关的总销售额 (total_amt_usd),并且要得出他们所在的区域。以下查询提供了这一信息。
    SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
    FROM sales_reps s
    JOIN accounts a
    ON a.sales_rep_id = s.id
    JOIN orders o
    ON o.account_id = a.id
    JOIN region r
    ON r.id = s.region_id
    GROUP BY 1,2
    ORDER BY 3 DESC;
    
    接着,得出每个区域的最高销售额,然后使用该信息从最终结果中获取这些行。
    SELECT region_name, MAX(total_amt) total_amt
    FROM(SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
    FROM sales_reps s
    JOIN accounts a
    ON a.sales_rep_id = s.id
    JOIN orders o
    ON o.account_id = a.id
    JOIN region r
    ON r.id = s.region_id
    GROUP BY 1, 2) inner1
    GROUP BY 1;
    
    本质上,这是两个表格的连接,其中区域和销售额相匹配。
    SELECT t1.rep_name, t1.region_name, t1.total_amt
    FROM(SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
    FROM sales_reps s
    JOIN accounts a
    ON a.sales_rep_id = s.id
    JOIN orders o
    ON o.account_id = a.id
    JOIN region r
    ON r.id = s.region_id
    GROUP BY 1,2
    ORDER BY 3 DESC) t1
    JOIN (SELECT region_name, MAX(total_amt) total_amt
    FROM(SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
    FROM sales_reps s
    JOIN accounts a
    ON a.sales_rep_id = s.id
    JOIN orders o
    ON o.account_id = a.id
    JOIN region r
    ON r.id = s.region_id
    GROUP BY 1, 2) inner1
    GROUP BY 1) t2
    ON t1.region_name = t2.region_name AND t1.total_amt = t2.total_amt;
    
  2. 对于具有最高销售额 (total_amt_usd) 的区域,总共下了多少个订单?

    我写的第一个查询是获取每个区域的 total_amt_usd
    SELECT r.name region_name, SUM(o.total_amt_usd) total_amt
    FROM sales_reps s
    JOIN accounts a
    ON a.sales_rep_id = s.id
    JOIN orders o
    ON o.account_id = a.id
    JOIN region r
    ON r.id = s.region_id
    GROUP BY r.name;
    
    然后,我们仅从该表格中获取销售额最高的区域。可以通过两种方法来获取,一种是使用子查询后的最大值,另一种是按降序排序,然后获取最高值。
    SELECT MAX(total_amt)
    FROM (SELECT r.name region_name, SUM(o.total_amt_usd) total_amt
    FROM sales_reps s
    JOIN accounts a
    ON a.sales_rep_id = s.id
    JOIN orders o
    ON o.account_id = a.id
    JOIN region r
    ON r.id = s.region_id
    GROUP BY r.name) sub;
    
    最终,我们要获取具有该区域销售额的总订单量:
    SELECT r.name, SUM(o.total) total_orders
    FROM sales_reps s
    JOIN accounts a
    ON a.sales_rep_id = s.id
    JOIN orders o
    ON o.account_id = a.id
    JOIN region r
    ON r.id = s.region_id
    GROUP BY r.name
    HAVING SUM(o.total_amt_usd) = (
    SELECT MAX(total_amt)
    FROM (SELECT r.name region_name, SUM(o.total_amt_usd) total_amt
    FROM sales_reps s
    JOIN accounts a
    ON a.sales_rep_id = s.id
    JOIN orders o
    ON o.account_id = a.id
    JOIN region r
    ON r.id = s.region_id
    GROUP BY r.name) sub);
    
    结果就是 Northeast,总订单为 1230378 个。
     
  3. 对于购买标准纸张数量 (standard_qty) 最多的客户(在作为客户的整个时期内),有多少客户的购买总数依然更多?

    首先,我们要得出购买标准纸张数量 (standard_qty) 最多的客户。以下查询获取了该客户,以及总消费:

    SELECT a.name account_name, SUM(o.standard_qty) total_std, SUM(o.total) total
    FROM accounts a
    JOIN orders o
    ON o.account_id = a.id
    GROUP BY 1
    ORDER BY 2 DESC
    LIMIT 1;
    

    现在,我将使用上述信息获取总消费更高的所有客户:

    SELECT a.name
    FROM orders o
    JOIN accounts a
    ON a.id = o.account_id
    GROUP BY 1
    HAVING SUM(o.total) > (SELECT total
    FROM (SELECT a.name act_name, SUM(o.standard_qty) tot_std, SUM(o.total) total
    FROM accounts a
    JOIN orders o
    ON o.account_id = a.id
    GROUP BY 1
    ORDER BY 2 DESC
    LIMIT 1) sub);
    

    上述查询列出了具有更多订单的客户列表。我们还可以使用另一个简单的子查询获取数量。

    SELECT COUNT(*)
    FROM (SELECT a.name
    FROM orders o
    JOIN accounts a
    ON a.id = o.account_id
    GROUP BY 1
    HAVING SUM(o.total) > (SELECT total
    FROM (SELECT a.name act_name, SUM(o.standard_qty) tot_std, SUM(o.total) total
    FROM accounts a
    JOIN orders o
    ON o.account_id = a.id
    GROUP BY 1
    ORDER BY 2 DESC
    LIMIT 1) inner_tab)
    ) counter_tab;
    
  4. 对于(在作为客户的整个时期内)总消费 (total_amt_usd) 最多的客户,他们在每个渠道上有多少 web_events

    我们首先需要获取在整个客户时期内消费最多的客户。

    SELECT a.id, a.name, SUM(o.total_amt_usd) tot_spent
    FROM orders o
    JOIN accounts a
    ON a.id = o.account_id
    GROUP BY a.id, a.name
    ORDER BY 3 DESC
    LIMIT 1;
    

    现在,我们要获取该企业(可以使用 id 进行匹配)在每个渠道上的事件数。

    SELECT a.name, w.channel, COUNT(*)
    FROM accounts a
    JOIN web_events w
    ON a.id = w.account_id AND a.id =  (SELECT id
    FROM (SELECT a.id, a.name, SUM(o.total_amt_usd) tot_spent
    FROM orders o
    JOIN accounts a
    ON a.id = o.account_id
    GROUP BY a.id, a.name
    ORDER BY 3 DESC
    LIMIT 1) inner_table)
    GROUP BY 1, 2
    ORDER BY 3 DESC;
    

    我添加了 ORDER BY,其实并没特别的理由,并添加了客户名称,确保仅从一个客户那获取数据。
     

  5. 对于总消费前十名的客户,他们的平均终身消费 (total_amt_usd) 是多少?

    首先,我们需要找出总消费 (total_amt_usd) 在前十名的客户。
    SELECT a.id, a.name, SUM(o.total_amt_usd) tot_spent
    FROM orders o
    JOIN accounts a
    ON a.id = o.account_id
    GROUP BY a.id, a.name
    ORDER BY 3 DESC
    LIMIT 10;
    
    现在计算这十个客户的平均消费。
    SELECT AVG(tot_spent)
    FROM (SELECT a.id, a.name, SUM(o.total_amt_usd) tot_spent
    FROM orders o
    JOIN accounts a
    ON a.id = o.account_id
    GROUP BY a.id, a.name
    ORDER BY 3 DESC
    LIMIT 10) temp;
    
  6. 比所有客户的平均消费高的企业平均终身消费 (total_amt_usd) 是多少?

    首先,算出所有客户的总消费 (total_amt_usd) 平均值:
    SELECT AVG(o.total_amt_usd) avg_all
    FROM orders o
    JOIN accounts a
    ON a.id = o.account_id;
    
    然后,只获取高于这一平均值的客户。
    SELECT o.account_id, AVG(o.total_amt_usd)
    FROM orders o
    GROUP BY 1
    HAVING AVG(o.total_amt_usd) > (SELECT AVG(o.total_amt_usd) avg_all
    FROM orders o
    JOIN accounts a
    ON a.id = o.account_id);
    
    最后,算出这些值的平均值。
    SELECT AVG(avg_amt)
    FROM (SELECT o.account_id, AVG(o.total_amt_usd) avg_amt
    FROM orders o
    GROUP BY 1
    HAVING AVG(o.total_amt_usd) > (SELECT AVG(o.total_amt_usd) avg_all
    FROM orders o
    JOIN accounts a
    ON a.id = o.account_id)) temp_table;

 

第一个 WITH (CTE)

下面是“你的第一个子查询”部分的问题和解决方案。

问题:你需要算出每个渠道每天的平均事件数。

解决方案:

SELECT channel, AVG(events) AS average_events
FROM (SELECT DATE_TRUNC('day',occurred_at) AS day,
channel, COUNT(*) as events
FROM web_events
GROUP BY 1,2) sub
GROUP BY channel
ORDER BY 2 DESC;

我们使用 WITH 语句重新编写查询。

注意:你可以获取内部查询:

SELECT DATE_TRUNC('day',occurred_at) AS day,
channel, COUNT(*) as events
FROM web_events
GROUP BY 1,2

我们在此部分放入 WITH 语句。注意,在下面我们将表格的别名设为 events

WITH events AS (
SELECT DATE_TRUNC('day',occurred_at) AS day,
channel, COUNT(*) as events
FROM web_events
GROUP BY 1,2)

现在,我们可以像对待数据库中的任何其他表格一样使用这个新创建的 events 表格:

WITH events AS (
SELECT DATE_TRUNC('day',occurred_at) AS day,
channel, COUNT(*) as events
FROM web_events
GROUP BY 1,2)

SELECT channel, AVG(events) AS average_events
FROM events
GROUP BY channel
ORDER BY 2 DESC;

对于上述示例,我们只需一个额外的表格,但是想象下我们要创建第二个表格来从中获取数据。我们可以按照以下方式来创建额外的表格并从中获取数据:

WITH table1 AS (
SELECT *
FROM web_events),

table2 AS (
SELECT *
FROM accounts)


SELECT *
FROM table1
JOIN table2
ON table1.account_id = table2.id;

然后,你可以按照相同的方式使用 WITH 语句添加越来越多的表格。底部的练习将确保你掌握了这些新查询的所有必要组成部分。

 

WITH 解决方案

以下是使用 WITH 条件重新编写的之前问题的答案。这些查询通常更容易读懂。

  1. 提供每个区域拥有最高销售额 (total_amt_usd) 的销售代表姓名
     
    WITH t1 AS (
    SELECT s.name rep_name, r.name region_name, SUM(o.total_amt_usd) total_amt
    FROM sales_reps s
    JOIN accounts a
    ON a.sales_rep_id = s.id
    JOIN orders o
    ON o.account_id = a.id
    JOIN region r
    ON r.id = s.region_id
    GROUP BY 1,2
    ORDER BY 3 DESC),
    t2 AS (
    SELECT region_name, MAX(total_amt) total_amt
    FROM t1
    GROUP BY 1)
    SELECT t1.rep_name, t1.region_name, t1.total_amt
    FROM t1
    JOIN t2
    ON t1.region_name = t2.region_name AND t1.total_amt = t2.total_amt;
    
  2. 对于具有最高销售额 (total_amt_usd) 的区域,总共下了多少个订单?
     
    WITH t1 AS (
    SELECT r.name region_name, SUM(o.total_amt_usd) total_amt
    FROM sales_reps s
    JOIN accounts a
    ON a.sales_rep_id = s.id
    JOIN orders o
    ON o.account_id = a.id
    JOIN region r
    ON r.id = s.region_id
    GROUP BY r.name),
    t2 AS (
    SELECT MAX(total_amt)
    FROM t1)
    SELECT r.name, SUM(o.total) total_orders
    FROM sales_reps s
    JOIN accounts a
    ON a.sales_rep_id = s.id
    JOIN orders o
    ON o.account_id = a.id
    JOIN region r
    ON r.id = s.region_id
    GROUP BY r.name
    HAVING SUM(o.total_amt_usd) = (SELECT * FROM t2);
    
  3. 对于购买标准纸张数量 (standard_qty) 最多的客户(在作为客户的整个时期内),有多少客户的购买总数依然更多?
     
    WITH t1 AS (
    SELECT a.name account_name, SUM(o.standard_qty) total_std, SUM(o.total) total
    FROM accounts a
    JOIN orders o
    ON o.account_id = a.id
    GROUP BY 1
    ORDER BY 2 DESC
    LIMIT 1),
    t2 AS (
    SELECT a.name
    FROM orders o
    JOIN accounts a
    ON a.id = o.account_id
    GROUP BY 1
    HAVING SUM(o.total) > (SELECT total FROM t1))
    SELECT COUNT(*)
    FROM t2;
    
  4. 对于(在作为客户的整个时期内)总消费 (total_amt_usd) 最多的客户,他们在每个渠道上有多少 web_events
     
    WITH t1 AS (
    SELECT a.id, a.name, SUM(o.total_amt_usd) tot_spent
    FROM orders o
    JOIN accounts a
    ON a.id = o.account_id
    GROUP BY a.id, a.name
    ORDER BY 3 DESC
    LIMIT 1)
    SELECT a.name, w.channel, COUNT(*)
    FROM accounts a
    JOIN web_events w
    ON a.id = w.account_id AND a.id =  (SELECT id FROM t1)
    GROUP BY 1, 2
    ORDER BY 3 DESC;
    
  5. 对于总消费前十名的客户,他们的平均终身消费 (total_amt_usd) 是多少?
     
    WITH t1 AS (
    SELECT a.id, a.name, SUM(o.total_amt_usd) tot_spent
    FROM orders o
    JOIN accounts a
    ON a.id = o.account_id
    GROUP BY a.id, a.name
    ORDER BY 3 DESC
    LIMIT 10)
    SELECT AVG(tot_spent)
    FROM t1;
    
  6. 比所有客户的平均消费高的企业平均终身消费 (total_amt_usd) 是多少?
     
    WITH t1 AS (
    SELECT AVG(o.total_amt_usd) avg_all
    FROM orders o
    JOIN accounts a
    ON a.id = o.account_id),
    t2 AS (
    SELECT o.account_id, AVG(o.total_amt_usd) avg_amt
    FROM orders o
    GROUP BY 1
    HAVING AVG(o.total_amt_usd) > (SELECT * FROM t1))
    SELECT AVG(avg_amt)
    FROM t2;

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值