近期一些sql练习题目汇总:
select name, orderdate, cost, sum(cost) over(partition by month(orderdatee))
from t_window;
想要cost按月累加:
select name, orderdat, cost, sum(cost) over(partition by name order by orderdate) as sample3
from t_window;
涉及到window的情况:
sum(cost) over(partition by name order by orderdate rows between UNBOUNDED PRECEDING and current row) as sample4, — 和sample3一样
sum(cost) over(partition by name order by orderdate rows between 1 PRECEDING and current row) as sample5 – 当前行和前1行做聚合
sum(cost) over(partition by name order by orderdate rows between 1 PRECEDING AND 1 FOLLOWING) as sample6 — 当前行和前边一行,以及后面一行做聚合;
sum(cost) over(partition by name order by orderdate rows between current row and UNBONDED FOLLOWING) as sample 7 – 当前行及后面所有行;
分组内切片函数:
NTILE:
select name, orderdate, cost,
ntile(3) over() as sample1, – 全局切片
ntile(3) over(partition by name), – 按name分组,在分组内将数据切片
ntile(3) over(order by cost), --全局按照cost升序排列,数据切成3份
ntile(3) over(partition by name order by cost) – 按name分组,在分组内按cost升序排列,数据切3份。
排序:
row_number() over(partition by cookieid order by pv desc) as ran1,
rank() over(partition by cookieid order by pv desc) as ran2,
dense_rank() over(partition by cookieid order by pv desc) as ran3.
结果:
ran1: 1, 2, 3, 4, 5, 6,7…
ran2: 1, 2, 3, 3, 5, 6,7…
ran3: 1, 2, 3, 3, 4, 5,6,7…
LAG, LEAD函数,可以返回上下数据行的数据
select name, orderdate, cost,
lag(orderdate, 1, ‘1900-01-01’) over(partition by name order by orderdate) as time1,
lag(orderdate, 2) over(partition by name order by orderdate) as time2
from t_window;
first_value取分组内排序后,截止到当前行,第一个值;
last_value取分组内排序后,截止到当前行,最后一个值
first_value(orderdate) over(partition by name order by xxx)
行转列
select cityname,
concat_ws(’,’, collect_set(regionname)) as address_set
from cityinfo
group by cityname;
列转行
select cityname, region
from cityinfoset
lateral view explode(split(regionname,’,’)) aa as region
如果是presto,可以写
select cityname, region
from cityinfoset
cross set unnest(regioname) as region;
时间戳转换:
mysql> select FROM_UNIXTIME(1344954515,’%Y-%m-%d %H:%i:%S’);
±----------------------------------------------+
| FROM_UNIXTIME(1344954515,’%Y-%m-%d %H:%i:%S’) |
±----------------------------------------------+
| 2012-08-14 22:28:35 |
±----------------------------------------------+
row in set (0.00 sec)
mysql> select UNIX_TIMESTAMP(‘2012-09-04 18:17:23’);
±--------------------------------------+
| UNIX_TIMESTAMP(‘2012-09-04 18:17:23’) |
±--------------------------------------+
| 1346753843 |
±--------------------------------------+
row in set (0.00 sec)
datediff(endDate, startDate)
datediff(‘2009-07-31’, ‘2009-07-30’); 1
coalesce(value1, value2, …) 返回第一个非null值,如果全为null,则返回null
mysql> select coalesce(null, 1, 2);
±---------------------+
| coalesce(null, 1, 2) |
±---------------------+
| 1 |
±---------------------+
1 row in set (0.03 sec)