看了很多用lead函数写两次交易(或者其他操作)之间的时间间隔,但是如果两次行为不一致的话就不太好直接取lead相减了
这里写了个join的方法,不知道有没有可以不用到join的办法来解答呢~
补充 gpt告诉我怎么用子查询来写了!
SELECT
AVG(DATEDIFF(next_success_date, transaction_date)) AS avg_days_to_next_successful
FROM (
SELECT
t1.user_id,
t1.transaction_date,
(
SELECT MIN(t2.transaction_date)
FROM test.tansaction t2
WHERE t2.user_id = t1.user_id
AND t2.transaction_date > t1.transaction_date
AND t2.status = 'success'
) AS next_success_date
FROM test.tansaction t1
WHERE t1.status = 'fail'
) AS failed_transactions_with_next_success;
假设我们取一个用户来看(多个用户按用户分组就完事了)
这是原表:
目的是:7.18的fail交易,下一次是在7.20成功
7.19的fail交易,下一次也是在7.20交易成功
那么平均间隔就是 (2+1)/2 = 1.5
```sql
SELECT user_id,
avg(day_interval) as succ_interval
from (
SELECT t1.user_id,
t1.trans_time as trans_time_fail,
t2.trans_time as trans_time_success,
datediff(t2.trans_time, t1.trans_time) as day_interval,
row_number() over(
partition by user_id,
t1.trans_time
order by t2.trans_time
) as rn
from
(
SELECT user_id,
trans_time,
trans_status
from test.tansaction
where trans_status = 'fail'
group by 1,
2
) t1
left join (
SELECT user_id,
trans_time,
trans_status
from test.tansaction
where trans_status = 'success'
group by 1,
2
) t2 on t1.user_id = t2.user_id
and t1.trans_time < t2.trans_time
) t3
where rn = 1
group by 1
;
中间的子查询是这样的表
最后结果