hive统计页面停留时间

1、背景:通过业务埋点数据,统计用户在页面的停留时间

样例数据,样例数据存入表tmp,

有如下字段用户uid、动作时间戳time、页面名称pn、动作名称action

SELECT '12345' AS uid, '1695613731020' AS time, '搜索' AS pn, 'click' AS action
UNION ALL
SELECT '12345' AS uid, '1695613732021' AS time, '搜索' AS pn, 'click' AS action
UNION ALL
SELECT '12345' AS uid, '1695613734024' AS time, '搜索' AS pn, 'click' AS action
UNION ALL
SELECT '12345' AS uid, '1695613737036' AS time, '列表' AS pn, 'click' AS action
UNION ALL
SELECT '12345' AS uid, '1695613738037' AS time, '列表' AS pn, 'click' AS action
UNION ALL
SELECT '12345' AS uid, '1695613740040' AS time, '列表' AS pn, 'click' AS action
uidtimepnaction
123451695613731020搜索click
123451695613732021搜索click
123451695613734024搜索click
123451695613737036列表click
123451695613738037列表click
123451695613740040列表click

思路:以用户维度按时间进行升序排列,通过lag函数找到上一个时间动作last_pn

SELECT uid, time, pn, row_number() OVER (PARTITION BY uid ORDER BY time DESC) AS rn
	, lag(pn, 1) OVER (PARTITION BY uid ORDER BY time ASC) AS last_pn
FROM tmp

rn排序的作用是找到最后一个动作

uidtimepnrnlast_pn
123451695613731020搜索6
123451695613732021搜索5搜索
123451695613734024搜索4搜索
123451695613737036列表3搜索
123451695613738037列表2列表
123451695613740040列表1列表

然后将发生页面变化的节点进行标记,

SELECT *, if(pn <> nvl(last_pn, '空') OR rn = 1, 1, 0) AS label
FROM (
SELECT uid, time, pn, row_number() OVER (PARTITION BY uid ORDER BY time DESC) AS rn
, lag(pn, 1) OVER (PARTITION BY uid ORDER BY time ASC) AS last_pn
FROM tmp
) t
uidtimepnrnlast_pnlabel
123451695613731020搜索61
123451695613732021搜索5搜索0
123451695613734024搜索4搜索0
123451695613737036列表3搜索1
123451695613738037列表2列表0
123451695613740040列表1列表1

之后统计停留时间就可以只看label =1的日志之间的时间差即可,全部代码如下,

WITH tmp AS (
		SELECT '12345' AS uid, '1695613731020' AS time, '搜索' AS pn, 'click' AS action
		UNION ALL
		SELECT '12345' AS uid, '1695613732021' AS time, '搜索' AS pn, 'click' AS action
		UNION ALL
		SELECT '12345' AS uid, '1695613734024' AS time, '搜索' AS pn, 'click' AS action
		UNION ALL
		SELECT '12345' AS uid, '1695613737036' AS time, '列表' AS pn, 'click' AS action
		UNION ALL
		SELECT '12345' AS uid, '1695613738037' AS time, '列表' AS pn, 'click' AS action
		UNION ALL
		SELECT '12345' AS uid, '1695613740040' AS time, '列表' AS pn, 'click' AS action
	)
SELECT *
FROM (
	SELECT uid, pn, time, CAST((lead(time, 1) OVER (PARTITION BY uid ORDER BY time ASC) - time) / 1000 AS BIGINT) AS stay_time
	FROM (
		SELECT *
			, if(pn <> nvl(last_pn, '空')
				OR rn = 1, 1, 0) AS label
		FROM (
			SELECT uid, time, pn, row_number() OVER (PARTITION BY uid ORDER BY time DESC) AS rn
				, lag(pn, 1) OVER (PARTITION BY uid ORDER BY time ASC) AS last_pn
			FROM tmp
		) t
	) tt
	WHERE label = 1
) ttt
WHERE stay_time IS NOT NULL

最终统计结果如下

uidpntimestay_time
12345搜索16956137310206
12345列表16956137370363
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值