Doris行列转化:Explode函数使用
1. 行列装换需求说明
在业务中我们经常会遇到将某一个列上的数据“散开”,然后进行计算,统计和分析。也就是典型的行列转化的需求。
例如我们有一个订单表,有bill_code和bill_state。其中bill_state列是个多个状态(待支付、支付中、支付完成)使用逗号隔开的字符串,我们需要按列对多个状态展开进行统计和分析。这样的需求在Doirs中可以通过explode_split函数完成。
原始数据
bill_code bill_state
1000 1,2
1001 1,3
1002 1
需求1:将bill_state字段按照指定分隔符(逗号)展开
bill_code bill_state
1000 1
1000 2
1001 1
1001 3
1002 1
需求2:对展开后的字段统计分析:count(bill_state)
bill_code bill_state
1000 2
1001 2
1002 1
需求3:对展开后的字段进行函数计算:sum(bill_state)
bill_code bill_state
1000 3
1001 4
1002 1
2. 行列装换实战
-
库表创建和数据导入
create database test; use test; CREATE TABLE bill ( bill_code VARCHAR(32) DEFAULT '', bill_state VARCHAR(32) DEFAULT '' ) UNIQUE KEY(bill_code) DISTRIBUTED BY HASH(bill_code) BUCKETS 10 PROPERTIES("replication_num" = "1"); INSERT INTO bill VALUES ('1000','1,2'); INSERT INTO bill VALUES ('1001','1,3'); INSERT INTO bill VALUES ('1002','1'); mysql> select * from bill; +-----------+------------+ | bill_code | bill_state | +-----------+------------+ | 1001 | 1,3 | | 1000 | 1,2 | | 1002 | 1 | +-----------+------------+ 3 rows in set (0.03 sec) mysql> -
数据切分:对 bill_state字段按照逗号切分,实现列转行。explode_split(bill_state, ‘,’)
mysql> select bill_code, state from bill lateral view explode_split(bill_state, ',') tmp1 as state order by bill_code, state;
+-----------+-------+
| bill_code | state |
+-----------+-------+
| 1000 | 1 |
| 1000 | 2 |
| 1001 | 1 |
| 1001 | 3 |
| 1002 | 1 |
+-----------+-------+
5 rows in set (0.06 sec)
- 对数据进行过滤操作:既可以按照已经存在的列进行数据过滤,也可以按照explode后的列数据进行过滤。
mysql> select bill_code, state from bill lateral view explode_split(bill_state, ',') tmp1 as state where bill_code ='1000' and state =1 order by bill_code, state;
+-----------+-------+
| bill_code | state |
+-----------+-------+
| 1000 | 1 |
+-----------+-------+
1 row in set (0.01 sec)
mysql>
mysql> select bill_code, state from bill lateral view explode_split(bill_state, ',') tmp1 as state where bill_code ='1000' and state in (1,2) order by bill_code, state;
+-----------+-------+
| bill_code | state |
+-----------+-------+
| 1000 | 1 |
| 1000 | 2 |
+-----------+-------+
2 rows in set (0.01 sec)
mysql>
mysql> select bill_code, state from bill lateral view explode_split(bill_state, ',') tmp1 as state where bill_code ='1000' order by bill_code, state;
+-----------+-------+
| bill_code | state |
+-----------+-------+
| 1000 | 1 |
| 1000 | 2 |
+-----------+-------+
2 rows in set (0.01 sec)
mysql>
mysql> select bill_code, state from bill lateral view explode_split(bill_state, ',') tmp1 as state where bill_code in ('1000','1001','1002') order by bill_code, state;
+-----------+-------+
| bill_code | state |
+-----------+-------+
| 1000 | 1 |
| 1000 | 2 |
| 1001 | 1 |
| 1001 | 3 |
| 1002 | 1 |
+-----------+-------+
5 rows in set (0.02 sec)
-
对展开数据执行count统计操作:按照bill_code分组,统计state条数
mysql> with e_result as (select bill_code, state from bill lateral view explode_split(bill_state, ',') tmp1 as state where bill_code in ('1000','1001','1002') order by bill_code, state) -> SELECT bill_code,count(state) from e_result group by bill_code order by bill_code; +-----------+----------------+ | bill_code | count(`state`) | +-----------+----------------+ | 1000 | 2 | | 1001 | 2 | | 1002 | 1 | +-----------+----------------+ 3 rows in set (0.03 sec) -
对展开数据执行sum统计操作:按照bill_code分组,对 state字段上的值进行sum计算。
mysql> with e_result as (select bill_code, state from bill lateral view explode_split(bill_state, ',') tmp1 as state where bill_code in ('1000','1001','1002') order by bill_code, state)
-> SELECT bill_code,sum(cast (state as BIGINT)) from e_result group by bill_code order by bill_code;
+-----------+------------------------------+
| bill_code | sum(CAST(`state` AS BIGINT)) |
+-----------+------------------------------+
| 1000 | 3 |
| 1001 | 4 |
| 1002 | 1 |
+-----------+------------------------------+
3. 多列explode
- 修改表结构
ALTER TABLE bill ADD COLUMN bill_type varchar(255);
- 写入数据
update bill set bill_type='a,b' where bill_code='1000';
update bill set bill_type='c,d' where bill_code='1001'
update bill set bill_type='a' where bill_code='1002'
- 数据查看
select * from bill;
bill_code bill_state bill_type
1001 1,3 c,d
1000 1,2 a,b
1002 1 a
- 多列执行explode_split
SELECT bill_code, state, type
FROM bill
LATERAL VIEW explode_split(bill_state, ',') tmp1 AS state
LATERAL VIEW explode_split(bill_type, ',') tmp2 AS type
ORDER BY bill_code, state;
查询结果:
bill_code state type
1000 1 a
1000 1 b
1000 2 a
1000 2 b
1001 1 c
1001 1 d
1001 3 c
1001 3 d
1002 1 a
最后宣传下我的书:
1 . 《图解Spark 大数据快速分析实战(异步图书出品)》 https://item.jd.com/13613302.html
2. 《Offer来了:Java面试核心知识点精讲(第2版)(博文视点出品)》https://item.jd.com/13200939.html
3. 《Offer来了:Java面试核心知识点精讲(原理篇)(博文视点出品)》https://item.jd.com/12737278.html
4. 《Offer来了:Java面试核心知识点精讲(框架篇)(博文视点出品)》 https://item.jd.com/12868220.html
4304

被折叠的 条评论
为什么被折叠?



