问题:各市随即抽样5000条数据
聚合函数用group by分组,每个分组返回一个统计值。但不能对每行进行分析,得不到问题的结果。而分析函数采用partition by分组,并且每组每行都可以返回一个统计值。辅助row_number()函数和生成随机数函数,取rownum<=5000的数据。
select *
from (select ROW_NUMBER() over(partition by 市代码 order by RAND()) row_num
,t1.*
from 表 as t1 ) t2
where row_num <= 5000 ;
分析函数带有一个开窗函数over(),包含三个分析子句:
分组(partition by)
排序(order by)
窗口(rows)
前两个较为常见,第三个表示分析函数的作用范围,默认范围是组内所有记录。
相当于
SUM(SAL) OVER(PARTITION BY E.DEPTNO
ORDER BY E.SAL
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
其它表示范围的写法还有:
第一行至当前行:
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
当前行至最后一行:
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
当前行的上一行(rownum-1)到当前行:
ROWS BETWEEN 1 PRECEDING AND CURRENT ROW
当前行的上一行(rownum-1)到当前行的下辆行(rownum+2):
ROWS BETWEEN 1 PRECEDING AND 2 FOLLOWING
———————————————————————————————————————
rows 和 range的区别:
以计算工资总和为例
rows
sum(salary)over(partition by dept order by user_id rows between current row and 2 following ) as sum_salary
sum_salary值则为当前行到下面两行的salary累加值
range
sum(salary)over(partition by dept order by user_id range between current row and 500 following ) as sum_salary
sum_salary值则为当前行到之后比当前行salary值最大大500以内的累加值