行转列公式
行转列的常规做法是,增加伪列,而伪列来源如下,即分组后,将符合某某条件的才累加或者统计,如例题一和例题二
group by + sum(if(条件表达式,字段名,0)
group by + count(if(条件表达式,字段名,null)
注:count不统计null值
当然if
语句也可以换成case when
,毕竟有些SQL
语法不支持if
,如MySQL
如果是多行合并为一列,则一般会用到concat_ws()、collect_list()、collect_set()等函数,一般写法如下,详情可见例题三
group by + concat_ws(",",collect_list(字段名))
group by + concat_ws(",",collect_set(字段名))
例题一 华泰证券
已知有如下数据
year | month | amount |
---|---|---|
1991 | 1 | 1.1 |
1991 | 2 | 1.2 |
1991 | 3 | 1.3 |
1991 | 4 | 1.4 |
1992 | 1 | 2.1 |
1992 | 2 | 2.2 |
1992 | 3 | 2.3 |
1992 | 4 | 2.4 |
需要查出展示为如下结果
year | m1 | m2 | m3 | m4 |
---|---|---|---|---|
1991 | 1.1 | 1.2 | 1.3 | 1.4 |
1992 | 2.1 | 2.2 | 2.3 | 2.4 |
创建表并构造数据
create table table1(year int,month int ,amount double);
insert overwrite table table1 values
(1991,1,1.1),
(1991,2,1.2),
(1991,3,1.3),
(1991,4,1.4),
(1992,1,2.1),
(1992,2,2.2),
(1992,3,2.3),
(1992,4,2.4);
select * from table1;
分析:
- 因为结果中有
5
列,所以查询的时候,应该有如下形式,即要新增四个伪列
select
year,
xxx as m1,
xxx as m2,
xxx as m3,
xxx as m4,
from table1
- 先获取每个伪列要统计的是什么值,本题中为当
month
为1
月时才把amout
显示到a
列,否则显示0
,同理,当month
为2
月时,才把amount
显示到b
列,否则显示0
,以此类推
select
*,
if(month = 1,amount,0) a,
if(month = 2,amount,0) b,
if(month = 3,amount,0) c,
if(month = 4,amount,0) d
from table1;
- 得到如上形式后,应该就很容易想到分组聚合了,把每列的值按
year
分组聚合,这样a
列的值不就只剩下为month=1
时的所有amout
累加嘛,这边是我们要的m1
列,其他列同理
select year,
sum(a) as m1,
sum(b) as m2,
sum(c) as m3,
sum(d) as m4
from (select
year,
if(month = 1,amount,0) a,
if(month = 2,amount,0) b,
if(month = 3,amount,0) c,
if(month = 4,amount,0) d
from table1) t -- hive子查询必须取别名
group by year;
上面三步若一步到位则为,即当month
为指定月份时,把amount
累加,否则加个0
即可,熟练之后是可以做到一步到位的,这也就是开始提到的公式
select
year,
sum(if(month = 1,amount,0)) m1,
sum(if(month = 2,amount,0)) m2,
sum(if(month = 3,amount,0)) m3,
sum(if(month = 4,amount,0)) m4
from table1
group by year;
例题二 腾讯游戏
表table如下:
DDate | shengfu |
---|---|
2015-05-09 | 胜 |
2015-05-09 | 胜 |
2015-05-09 | 负 |
2015-05-09 | 负 |
2015-05-10 | 胜 |
2015-05-10 | 负 |
2015-05-10 | 负 |
如果要生成下列结果, 该如何写sql语句?
DDate | 胜 | 负 |
---|---|---|
2015-05-09 | 2 | 2 |
2015-05-10 | 1 | 2 |
分析:直接按DDate进行分组,新增两个伪列,一列统计胜的场次,一列统计负的场次
--建表
create table table1(DDate string, shengfu string) ;
insert overwrite table table1 values ('2015-05-09', "胜"),
('2015-05-09', "胜"),
('2015-05-09', "负"),
('2015-05-09', "负"),
('2015-05-10', "胜"),
('2015-05-10', "负"),
('2015-05-10', "负");
-- 写法一
select DDate,
SUM(case when shengfu = '胜' then 1 else 0 end) `胜`,
SUM(case when shengfu = '负' then 1 else 0 end) `负`
from table1
group by DDate;
-- 写法二
select
DDate,
sum(if(shengfu = '胜',1,0)) as `胜`,
sum(if(shengfu = '负',1,0)) as `负`
from table1
group by DDate;
-- 写法三 利用了count不统计null值的特性
select
DDate,
count(if(shengfu = '胜',1,null)) as `胜`,
count(if(shengfu = '负',1,null)) as `负`
from table1
group by DDate;
例题三 腾讯QQ
假设tableA
如表5
, tableB
如表6
,
表5
qq号(字段名:qq) | 游戏(字段名:game) |
---|---|
10000 | a |
10000 | b |
10000 | c |
20000 | c |
20000 | d |
表6
qq号(字段名:qq) | 游戏(字段名:game) |
---|---|
10000 | a_b_c |
20000 | c_d |
请写出以下sql逻辑:
a, 将tableA输出为tableB的格式; 【行转列】
b, 将tableB输出为tableA的格式; 【列转行】
create table tableA(qq string, game string)
insert overwrite table tableA values
(10000, 'a'),
(10000, 'b'),
(10000, 'c'),
(20000, 'c'),
(20000, 'd');
create table tableB(qq string, game string) ;
insert overwrite table tableB values
(10000, 'a_b_c'),
(20000, 'c_d');
--将tableA输出为tableB的格式;
select qq,
concat_ws('_', collect_list(game)) game
from tableA
group by qq;
对于列转行,一般都会使用爆炸函数explode
实现
--将tableB输出为tableA的格式;
select qq,
tmp.game
from tableB lateral view explode(split(game, '_')) tmp as game;