现在有一张表,2个字段,字段A:id, 字段B:array类型:
col_name | data_type |
---|---|
gameid | int |
businesstype | array |
select gameid, businesstype from kua.tablea where date=20190122 limit 5;
+-----------+---------------------------+--+
| gameid | businesstype |
+-----------+---------------------------+--+
| 62230904 | ["SDK","FUNGAME"] |
| 62230006 | ["MIMO","break"] |
| 62230003 | ["SDK","MIMO","FUNGAME"] |
| 62230009 | [] |
| 62230011 | ["MIMO","break"] |
+-----------+---------------------------+--+
下面来看看array字段要怎么处理:
1.取B字段中的第一个元素值:
select gameid, businesstype[0] from kua.tablea where date=20190122 limit 5;
+-----------+-------+--+
| gameid | _c1 |
+-----------+-------+--+
| 62230904 | SDK |
| 62230006 | MIMO |
| 62230003 | SDK |
| 62230009 | NULL |
| 62230011 | MIMO |
+-----------+-------+--+
如果字段中含有数据,那么正常返回第一个值;如果array是空的,那么返回NULL
2.判断字段为Array类型的值,是否包含某值
select gameid, array_contains(businesstype, 'SDK') from kua.tablea where date=20190122 limit 5;
+-----------+--------+--+
| gameid | _c1 |
+-----------+--------+--+
| 62230904 | true |
| 62230006 | false |
| 62230003 | true |
| 62230009 | false |
| 62230011 | false |
+-----------+--------+--+
array_contains方法返回一个Boolean对象,可以配合if函数使用,例如 if(array_contains(_cloumn,'A'),'包含A','不包含A')
3.一行转多行,把array中的每个值都拿出来作为一行显示
select t.gameid, singleBusinesstype
from (
select gameid, businesstype
from kua.tablea where date=20190122 limit 5
) t LATERAL VIEW explode(t.businesstype) v as singleBusinesstype
+-----------+---------------------+--+
| t.gameid | singlebusinesstype |
+-----------+---------------------+--+
| 62230904 | SDK |
| 62230904 | FUNGAME |
| 62230010 | break |
| 62230010 | MIMO |
| 62230008 | MIMO |
| 62230008 | break |
| 62230006 | MIMO |
| 62230006 | break |
| 62230003 | SDK |
| 62230003 | MIMO |
| 62230003 | FUNGAME |
+-----------+---------------------+--+
配合 LATERAL VIEW、explode 2个函数,可以实现把一个array类型的值分开。
LATERAL VIEW通常用于一行转多行的处理逻辑,而对应的,在hive中实现多行转一行的处理逻辑可以使用<collect_set> <collect_list>函数来实现, 他们返回的对象都是一个array集合,那么又回到上面array对象的处理逻辑了,差别在在于前者会去重。