yhd日志分析(二)

yhd日志分析(二)

继续yhd日志分析,统计数据

1 分析

登录人数

count(distinct endUserId)

游客人数

count(distinct guid) - count(distinct endUserId)

平均访问时长

先把tracktime转换为unix timestamp, 相同sessionId的tracktime中,max(tracktime)-min(tracktime),得到用户停留时间。所有用户的停留时间相加,得到总停留时间。总停留时间和总访问次数的比例,就是平均访问时长

select sum(stay_time) as total_stay_time
from (select max(to_unix_timestamp(trackTime)) - min(to_unix_timestamp(trackTime)) as stay_time from yhd_log group by sessionId) stay

用户访问总数

count(distinct sessionId)

二跳率

sessionViewNo=2的用户,即为二跳用户。统计出二跳用户和uv的比例

select count(distinct guid) from yhd_log where sessionViewNo=2

独立ip数

count(distinct ip)

实现

  1. 借助中间表,分别存放停留时间和二次跳用户总数

     // 存放总停留时间
    
     create table if not exists yhd_log_total_stay_time(
         date string,
         total_stay_time bigint
     )
     row format delimited fields terminated by '\t'
     stored as textfile;
    
     // 存放二次跳用户总数
    
     create table if not exists yhd_log_total_second_jump(
         date string,
         total_second_jump bigint
     )
     row format delimited fields terminated by '\t'
     stored as textfile;
  2. 计算总停留时间,存放在yhd_log_total_stay_time, 按日期分组

     insert overwrite table yhd_log_total_stay_time
     select date, sum(stay_time) as total_stay_time
     from (select max(to_unix_timestamp(trackTime)) - min(to_unix_timestamp(trackTime)) as stay_time, date from yhd_log group by date, sessionId) stay
     group by date
  3. 计算二次跳用户总数,存放在yhd_log_total_second_jump, 按日期分组

     insert overwrite table yhd_log_total_second_jump
     select date, count(distinct guid)
     from yhd_log
     where sessionViewNo=2 
     group by date
  4. 统计

     把yhd_log_total_stay_time,yhd_log_total_second_jump和yhd_log按照 date连接查询
    
     select date, pv, uv, user_count, guest_count, 
         total_stay_time/total_visit as average_stay_time,
         total_second_jump/ uv as second_jump_rate, indepent_ip
     from (
     select log.date, 
         count(url) as pv, 
         count(distinct guid) as uv,
         count(distinct endUserId) as user_count,
         count(distinct guid) - count(distinct endUserId) as guest_count,
         count(distinct sessionId) as total_visit,
         min(stay.total_stay_time) as total_stay_time,
         min(second.total_second_jump) as total_second_jump,
         count(distinct ip) as indepent_ip
     from yhd_log log inner join yhd_log_total_stay_time stay on stay.date=log.date inner join yhd_log_total_second_jump second on second.date=log.date
     group by log.date
     ) stat

结果

20150828126134390071768721320745.97973932447510.1311815827928320730462

转载于:https://www.cnblogs.com/ivanny/p/yhd_log_statatis_2.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值