1 假设我们有一个数据存储这个各个用户访问的新闻id,其中content字段为每个用户每天访问的所有新闻id。具体如下:
统计每天访问了多少量新闻id
select tt1.dt, count(1) from
(
select dt, content
from dw_portal_basedata_orc_appkey_dt t1
where t1.dt>='20170601' and t1.dt<='20170604' and t1.page='1' and t1.area1='300' and t1.action='1' and t1.content<>''
)tt1
lateral view explode(split(tt1.content,',')) t2 as contentid
group by dt;
或者
select t1.dt, count(1) from sog_portal.dw_portal_basedata_orc_appkey_dt t1
lateral view explode(split(t1.content,',')) t2 as contentid
where t1.dt>='20170601' and t1.dt<='20170604' and t1.page='1' and t1.area1='300' and t1.action='1' and t1.content<>''group by t1.dt;
注解:1,首先将content这个字段split变成一个list,然后用lateral view explode
ps1: 如果仅仅只是计算id出现次数,用size函数然后求和也未尝不可
select dt, sum(size(split(content, ',')))
from
sog_portal.dw_portal_basedata_orc_appkey_dt
where dt>='20170601' and dt<='20170604' and trim(content)!='' and content is not null and page='1' and area1='300' and action='1' and content rlike '^[0-9]'
group by dt;
ps2: 如果content只是数字,需要正则表达式
在where条件中添加:content rlike '^[0-9]'