网上有很多人问到关于数据均匀分组的问题,很多回答都使用了游标。这里介绍不使用游标(基于关系运算)的解决方案。
场景1: 学校举行运动会,将报名参加赛跑的学生分组并安排跑道。要求:8个人一组,尽量使不同班次的同学在一组; 不足8个班级时,才允许相同的班级的同学在一组。
解决方案:
创建一个students表,有id,name,class等列。记录参赛同学。
第一步: 按班级分区排序,使每个同学在班里有一个唯一的流水号。
select id,name, class,
ROW_NUMBER() over(partition by class order by id) as seq -- 每个同学分配一个在各自班里唯一的流水号
from students
部分查询结果的截图如下:
第二步:按照新列seq排序,重新编号,将不同班级的同学均匀地分布。
select id, name,class,seq,
ROW_NUMBER() over(order by seq,class) as global_seq
from
(select id,name, class,
ROW_NUMBER() over(partition by class order by id) as seq
from students) as a
部分查询结果的截图如下:
第三步:生成组号和跑道号。 分组的方法是:8个人一组,globle_seq除以8的整数部分就是组号,余数部分就是跑道号。
select id,name,class,
(global_seq - 1) / 8 + 1 as group_no, -- 组号,从1开始
(global_seq - 1) % 8 + 1 as line_no -- 跑道号,从1到8
from
(select id, name,class,seq,
ROW_NUMBER() over(order by seq,class) as global_seq
from
(select id,name, class,
ROW_NUMBER() over(partition by class order by id) as seq
from students) as a
) as b
部分查询结果截图:
场景2:用B表的记录均匀有序地更新A表中的记录。
例如:A表有id,name, gender列(空)。截图如下:
B表为有id,gender列,包含所有的性别。当然,性别只有两种。在其它应用场景,B表可能有很多行。B表截图:
A表的gender 列为空,需要通过B表中的数据来更新。要求用B表中的记录交替更新A表中的记录。
期望结果如下:
解决方案1:
1. 先用row_number函数将A、B表的每行都分配一个从0开始的序号。并且,计算出B表的总行数。
select id, name,
row_number() over (order by id) - 1 as seq --流水号从0开始,以便后面的取余运算。
from A
select gender,
row_number() over (order by id) - 1 as seq,
count(*) over() as total_rows -- 计算B表的总行数
from B
2. 将两个查询连接起来,分组。
select c.id, c.name, d.gender
from
(select id, name, row_number() over (order by id) - 1 as seq
from A) as c,
(select gender, row_number() over (order by id) - 1 as seq, count(*) over() as total_rows
from B) as d
where c.seq % d.total_rows = d.seq
3. 用查询结果更新A表。为了更好的可读性,将查询放到一个CTE里。
with cte_new
as
(
select c.id, c.name, d.gender
from
(select id, name, row_number() over (order by id) - 1 as seq
from A) as c,
(select gender, row_number() over (order by id) - 1 as seq, count(*) over() as total_rows
from B) as d
where c.seq % d.total_rows = d.seq
)
update A set gender = c.gender
from A , cte_new as c
where A.id = c.id;
解决方案2:
在这个案例中,B表只有2条记录,可以使用case代替连接,对解决方案1进行简化。
完整代码:
with cte_new
as
(
select id,name,case ROW_NUMBER() over(order by id) % 2 when 1 then N'男' else N'女' end as gender
from A
)
update A set gender = c.gender
from A , cte_new as c
where A.id = c.id;
这个方案不需要访问B表,不会对其它需要使用B表的并发查询产生影响。但是,只适应CASE比较少的场景,并且需要将B表中的值硬编码到查询中。