简单看了下当初学SQL的课件,基本的几个点都有印象。不过个人相信做题时最高效的回忆方式。依照SQL的榜单开始刷!
【10/25】:刷了十多题最大的感触就是,入门得耐心花时间去重新熟悉函数和每个基本结构的作用,之后只刷写起来有意义的题。如果复杂度高了,一定要去追究怎么可以降低,尽量还是少写没必要的复杂语句。
再回过头来引用最近看到的Data Mining教授写的一段话:
-首先查看SELECT中的各个字段,然后查看维度表,以确定这些SELECT字段来自哪个表。
-然后查看SELECT中的聚合值,如sum(amount_sold)。这是通过维度属性(即报表标签)聚合的事实度量。
-再看WHERE条件:这些条件过滤数据,只从汇总数据立方体中提取一部分。有些WHERE条件在来自其他维度表的字段上,需要更多的连接。在这些查询中使用WHERE条件的目的不是使它们变得复杂并添加更多的连接;相反,我们试图减少结果中的行数,以便更容易查看结果并更快地运行查询。
-最后,查看Group By中使用的其他函数,如Rollup、多维数据集或分组集。
EASY
511. Game Play Analysis I <min()>
Write an SQL query to report the first login date for each player.
Return the result table in any order.
Input:
Activity table:
+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1 | 2 | 2016-03-01 | 5 |
| 1 | 2 | 2016-05-02 | 6 |
| 2 | 3 | 2017-06-25 | 1 |
| 3 | 1 | 2016-03-02 | 0 |
| 3 | 4 | 2018-07-03 | 5 |
+-----------+-----------+------------+--------------+
Output:
+-----------+-------------+
| player_id | first_login |
+-----------+-------------+
| 1 | 2016-03-01 |
| 2 | 2017-06-25 |
| 3 | 2016-03-02 |
+-----------+-------------+
SOLUTION:
SELECT player_id, min(event_date) as first_login
FROM Activity
GROUP BY player_id
512. Game Play Analysis II <row_number()>
Write an SQL query to report the device that is first logged in for each player.
Return the result table in any order.
Input:
Activity table:
+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1 | 2 | 2016-03-01 | 5 |
| 1 | 2 | 2016-05-02 | 6 |
| 2 | 3 | 2017-06-25 | 1 |
| 3 | 1 | 2016-03-02 | 0 |
| 3 | 4 | 2018-07-03 | 5 |
+-----------+-----------+------------+--------------+
Output:
+-----------+-----------+
| player_id | device_id |
+-----------+-----------+
| 1 | 2 |
| 2 | 3 |
| 3 | 1 |
+-----------+-----------+
SOLUTION:
# 要在根据player_id分组的情况下找到基于event_date的排序。
# row_number() OVER (PARTITION BY COL1 ORDER BY COL2) 表示根据COL1分组,在分组内部根据 COL2排序,而此函数计算的值就表示每组内部排序后的顺序编号
with rankedTable as (
SELECT player_id, device_id, event_date,
ROW_NUMBER() OVER(PARTITION BY player_id ORDER BY event_date) rn
FROM Activity
)
SELECT player_id, device_id
FROM rankedTable
WHERE rn = 1
这明显就比上一题复杂一点,因为不是所有列都能直接从Activity