面试题:求出这个客服这次会话中,用户每次发第一次消息的时间与客服回复的第一次时间差

原数据

taskidcreatetimeuseridspecialistidsenduseridcontent
32856204454157300002018-06-26 12:10:46.0328102920049212000003281029200492120000aaaaaa
32856204454157300002018-06-26 13:56:58.0328102920049212000032757831556383600003275783155638360000bbbbb
32856204454157300002018-06-26 13:57:32.0328102920049212000032757831556383600003275783155638360000
32856204454157300002018-06-26 14:06:59.0328102920049212000032757831556383600003281029200492120000
32856204454157300002018-06-26 14:07:41.0328102920049212000032757831556383600003281029200492120000
32856204454157300002018-06-26 14:15:56.0328102920049212000032757831556383600003275783155638360000
32856204454157300002018-06-26 14:16:12.0328102920049212000032757831556383600003275783155638360000
32856204454157300002018-06-26 14:16:31.0328102920049212000032757831556383600003275783155638360000
32856204454157300002018-06-26 14:16:40.0328102920049212000032757831556383600003281029200492120000
32856204454157300002018-06-26 14:17:14.0328102920049212000032757831556383600003281029200492120000

要求交付的数据结果表如下,请用一个简洁的hql计算出如下数据:

taskidcreatetimeuseridspecialistidreplytime(回复时间)response_speed(响应速度)
32856204454157300002018-06-26 12:10:46.0328102920049212000032757831556383600002018-06-26 13:56:58.001:46:12
32856204454157300002018-06-26 14:06:59.0328102920049212000032757831556383600002018-06-26 14:15:56.000:08:57
32856204454157300002018-06-26 14:16:40.0328102920049212000032757831556383600002018-06-27 13:43:56.023:27:16
32856410698489500002018-06-26 14:50:50.0327413372284017000032757831556383600002018-06-26 16:45:27.001:54:37

思路分析

  1. 每一个taskid只对应一个userid,但可以对应多个specialistid;
  2. 客户(userid)可连续发送n条信息给服务端(specialist),服务端(specialist)也可以连续回复n条信息给客户(userid);
  3. 客户 => 服务端(×n) , 服务端 => 客户(×n) 为一个 循环,一个taskid中可以有多个循环;
  4. 需求用户每次发第一次消息的时间与客服回复的第一次时间差 =》 一个循环中的 first(客户 => 服务端) , first(服务端 => 客户);

解题步骤

标记法

  1. 客户 => 服务端 类型的消息 标记为 , 服务端 => 客户 类型的消息 标记为1
select *,if(senduserid = userid,0,1) check from fact_customer_service_info;
  1. 为了去除思路分析中第2条的连续发送问题,新增一列match_check,该列为上一行的check,第一行的上一行没有,默认取1
select 
t1.* , 
lag(check,1,1) over(distribute by taskid sort by createtime) match_check
from
(select 
*,
if(senduserid = userid,0,1) check 
from 
fact_customer_service_info) t1
order by taskid;
  1. 去除check与match_check相同的列,这样就可以去除不是第一条的连续发送信息行
select 
t2.*
from
(select 
t1.* , 
lag(check,1,1) over(distribute by taskid sort by createtime) match_chenck
from
(select 
*,
if(senduserid = userid,0,1) check from fact_customer_service_info) t1 ) t2
where t2.check <> t2.match_chenck;
  1. 然后check列则为0、1、0、1、0 或者 0、1、0、1两种情况,将check为1的行中specialist 和 createtime并到上一行中即可
select
taskid,
if(check = 0 and match_chenck = 1,createtime,0) as createtime,
userid,
if((lead(match_chenck,1) over(distribute by taskid sort by createtime) = 0),(lead(specialistid,1) over(distribute by taskid sort by createtime)),null) as specialistid,
if((lead(match_chenck,1) over(distribute by taskid sort by createtime) = 0),(lead(createtime,1) over(distribute by taskid sort by createtime)),null) as replytime
from
(select
t2.*
from
(select 
t1.* , 
lag(check,1,1) over(distribute by taskid sort by createtime) match_chenck
from
(select 
*,
if(senduserid = userid,0,1) check from fact_customer_service_info) t1)t2
where t2.check <> t2.match_chenck) t3;
  1. 这样就完成了合并了一个循环内的客户 => 服务端 类型的消息服务端 => 客户 类型的消息
select
t4.taskid,
t4.createtime,
t4.userid,
t4.specialistid,
t4.replytime,
from_unixtime(unix_timestamp(t4.replytime) - unix_timestamp(t4.createtime) - 8 * 60 * 60, 'dd HH:mm:ss') as respond_speed
from
(select
taskid,
if(check = 0 and match_chenck = 1,createtime,0) as createtime,
userid,
if((lead(match_chenck,1) over(distribute by taskid sort by createtime) = 0),(lead(specialistid,1) over(distribute by taskid sort by createtime)),null) as specialistid,
if((lead(match_chenck,1) over(distribute by taskid sort by createtime) = 0),(lead(createtime,1) over(distribute by taskid sort by createtime)),null) as replytime
from
(select 
t2.*
from
(select t1.* ,
lag(check,1,1) over(distribute by taskid sort by createtime) match_chenck
from
(select 
*,
if(senduserid = userid,0,1) check from fact_customer_service_info) t1)t2
where t2.check <> t2.match_chenck) t3) t4
where t4.replytime is not null;

核心思想为打标记

  • 同一个循环中的客户发送,服务端发送作出不同标记
  • 使用lag()函数开窗,行数之间错位后的checkmatch_check不同则为我们想要的数据
  • 最后则是去除相同数据、以及合并之后的null数据

最后需要注意的是时间问题,
from_unixtime() 是从1970年01月01日 00:00:00开始计算增加的秒数,得出年月日,时分秒
由于处于东8区,转换replytime 和 createtime的时间差时,会与格林尼治时间有个8小时的时区差,需减掉8小时的时差才能得出正确时间差

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值