Doris行转列之-Explode函数使用

Doris行列转化:Explode函数使用

1. 行列装换需求说明

在业务中我们经常会遇到将某一个列上的数据“散开”,然后进行计算,统计和分析。也就是典型的行列转化的需求。

例如我们有一个订单表,有bill_code和bill_state。其中bill_state列是个多个状态(待支付、支付中、支付完成)使用逗号隔开的字符串,我们需要按列对多个状态展开进行统计和分析。这样的需求在Doirs中可以通过explode_split函数完成。

原始数据

bill_code bill_state
1000      1,2
1001      1,3
1002       1

需求1:将bill_state字段按照指定分隔符(逗号)展开

bill_code bill_state
1000       1
1000       2
1001       1
1001       3
1002       1

需求2:对展开后的字段统计分析:count(bill_state)

bill_code bill_state
1000       2
1001       2
1002       1

需求3:对展开后的字段进行函数计算:sum(bill_state)

bill_code bill_state
1000       3
1001       4
1002       1
2. 行列装换实战
  1. 库表创建和数据导入

    create database test;
    use test;
    CREATE TABLE bill
     (
             bill_code VARCHAR(32) DEFAULT '',
             bill_state VARCHAR(32) DEFAULT ''
     )
     UNIQUE KEY(bill_code)
     DISTRIBUTED BY HASH(bill_code) BUCKETS 10
     PROPERTIES("replication_num" = "1");
     
    INSERT INTO bill VALUES ('1000','1,2');
    INSERT INTO bill VALUES ('1001','1,3');
    INSERT INTO bill VALUES ('1002','1');
    
    mysql> select * from bill;
    +-----------+------------+
    | bill_code | bill_state |
    +-----------+------------+
    | 1001      | 1,3        |
    | 1000      | 1,2        |
    | 1002      | 1          |
    +-----------+------------+
    3 rows in set (0.03 sec)
    
    mysql> 
    
  2. 数据切分:对 bill_state字段按照逗号切分,实现列转行。explode_split(bill_state, ‘,’)

mysql> select bill_code, state from bill lateral view explode_split(bill_state, ',') tmp1 as state  order by bill_code, state;
+-----------+-------+
| bill_code | state |
+-----------+-------+
| 1000      | 1     |
| 1000      | 2     |
| 1001      | 1     |
| 1001      | 3     |
| 1002      | 1     |
+-----------+-------+
5 rows in set (0.06 sec)


  1. 对数据进行过滤操作:既可以按照已经存在的列进行数据过滤,也可以按照explode后的列数据进行过滤。
mysql> select bill_code, state from bill lateral view explode_split(bill_state, ',') tmp1 as state where bill_code ='1000' and state =1 order by bill_code, state;
+-----------+-------+
| bill_code | state |
+-----------+-------+
| 1000      | 1     |
+-----------+-------+
1 row in set (0.01 sec)

mysql> 
mysql> select bill_code, state from bill lateral view explode_split(bill_state, ',') tmp1 as state where bill_code ='1000' and state in (1,2) order by bill_code, state;
+-----------+-------+
| bill_code | state |
+-----------+-------+
| 1000      | 1     |
| 1000      | 2     |
+-----------+-------+
2 rows in set (0.01 sec)

mysql> 
mysql> select bill_code, state from bill lateral view explode_split(bill_state, ',') tmp1 as state where bill_code ='1000' order by bill_code, state;
+-----------+-------+
| bill_code | state |
+-----------+-------+
| 1000      | 1     |
| 1000      | 2     |
+-----------+-------+
2 rows in set (0.01 sec)

mysql> 
mysql> select bill_code, state from bill lateral view explode_split(bill_state, ',') tmp1 as state where bill_code in ('1000','1001','1002') order by bill_code, state;
+-----------+-------+
| bill_code | state |
+-----------+-------+
| 1000      | 1     |
| 1000      | 2     |
| 1001      | 1     |
| 1001      | 3     |
| 1002      | 1     |
+-----------+-------+
5 rows in set (0.02 sec)
  1. 对展开数据执行count统计操作:按照bill_code分组,统计state条数

    mysql> with  e_result as (select bill_code, state from bill lateral view explode_split(bill_state, ',') tmp1 as state where bill_code in ('1000','1001','1002') order by bill_code, state)
        -> SELECT bill_code,count(state) from e_result group by bill_code order by bill_code;
    +-----------+----------------+
    | bill_code | count(`state`) |
    +-----------+----------------+
    | 1000      |              2 |
    | 1001      |              2 |
    | 1002      |              1 |
    +-----------+----------------+
    3 rows in set (0.03 sec)
    
    
  2. 对展开数据执行sum统计操作:按照bill_code分组,对 state字段上的值进行sum计算。

mysql>     with  e_result as (select bill_code, state from bill lateral view explode_split(bill_state, ',') tmp1 as state where bill_code in ('1000','1001','1002') order by bill_code, state)
    ->   SELECT bill_code,sum(cast (state as BIGINT)) from e_result group by bill_code order by bill_code;
+-----------+------------------------------+
| bill_code | sum(CAST(`state` AS BIGINT)) |
+-----------+------------------------------+
| 1000      |                            3 |
| 1001      |                            4 |
| 1002      |                            1 |
+-----------+------------------------------+
3. 多列explode
  1. 修改表结构
ALTER TABLE bill ADD COLUMN bill_type varchar(255);
  1. 写入数据
update bill set bill_type='a,b' where bill_code='1000';
update bill set bill_type='c,d' where bill_code='1001'
update bill set bill_type='a' where bill_code='1002'
  1. 数据查看
 select * from bill;
bill_code	bill_state	bill_type
1001	1,3	c,d
1000	1,2	a,b
1002	1	a

  1. 多列执行explode_split
SELECT bill_code, state, type
FROM bill
	LATERAL VIEW explode_split(bill_state, ',') tmp1 AS state
	LATERAL VIEW explode_split(bill_type, ',') tmp2 AS type
ORDER BY bill_code, state;

查询结果:

bill_code	state	type
1000	1	a
1000	1	b
1000	2	a
1000	2	b
1001	1	c
1001	1	d
1001	3	c
1001	3	d
1002	1	a

最后宣传下我的书:
1 . 《图解Spark 大数据快速分析实战(异步图书出品)》 https://item.jd.com/13613302.html
2. 《Offer来了:Java面试核心知识点精讲(第2版)(博文视点出品)》https://item.jd.com/13200939.html
3. 《Offer来了:Java面试核心知识点精讲(原理篇)(博文视点出品)》https://item.jd.com/12737278.html
4. 《Offer来了:Java面试核心知识点精讲(框架篇)(博文视点出品)》 https://item.jd.com/12868220.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值