目录
描述:1161.每台机器的进程平均运行时间
现在有一个工厂网站由几台机器运行,每台机器上运行着 相同数量的进程 。编写解决方案,计算每台机器各自完成一个进程任务的平均耗时。
完成一个进程任务的时间指进程的
'end' 时间戳
减去'start' 时间戳
。平均耗时通过计算每台机器上所有进程任务的总耗费时间除以机器上的总进程数量获得。结果表必须包含
machine_id(机器ID)
和对应的 average time(平均耗时) 别名processing_time
,且四舍五入保留3位小数。
数据准备:
Create table If Not Exists Activity (machine_id int, process_id int, activity_type ENUM('start', 'end'), timestamp float)
Truncate table Activity
insert into Activity (machine_id, process_id, activity_type, timestamp) values ('0', '0', 'start', '0.712')
insert into Activity (machine_id, process_id, activity_type, timestamp) values ('0', '0', 'end', '1.52')
insert into Activity (machine_id, process_id, activity_type, timestamp) values ('0', '1', 'start', '3.14')
insert into Activity (machine_id, process_id, activity_type, timestamp) values ('0', '1', 'end', '4.12')
insert into Activity (machine_id, process_id, activity_type, timestamp) values ('1', '0', 'start', '0.55')
insert into Activity (machine_id, process_id, activity_type, timestamp) values ('1', '0', 'end', '1.55')
insert into Activity (machine_id, process_id, activity_type, timestamp) values ('1', '1', 'start', '0.43')
insert into Activity (machine_id, process_id, activity_type, timestamp) values ('1', '1', 'end', '1.42')
insert into Activity (machine_id, process_id, activity_type, timestamp) values ('2', '0', 'start', '4.1')
insert into Activity (machine_id, process_id, activity_type, timestamp) values ('2', '0', 'end', '4.512')
insert into Activity (machine_id, process_id, activity_type, timestamp) values ('2', '1', 'start', '2.5')
insert into Activity (machine_id, process_id, activity_type, timestamp) values ('2', '1', 'end', '5')
分析:
法一:使用窗口函数
①先根据machine_id,process_id分组,并找出每组最大时间时间戳
select *, max(timestamp) over (partition by machine_id,process_id )②筛选出activity_type为start的行数据,构造r1-timestamp
with t1 as (select *, max(timestamp) over (partition by machine_id,process_id ) r1 from activity ) # , t2 as( select machine_id, process_id, (r1-timestamp) r3 from t1 where activity_type = 'start'③以machine_id分组求avg(r3),完善数据格式
select machine_id,round(avg(r3),3)processing_time from t2 group by machine_id法二:连接
①观察数据可以通过拆解为两个表再连接,如下图
select * from activity where activity_type = 'start' select * from activity where activity_type = 'end';②根据machine_id排序然后求差求平均
select machine_id, round(avg(tt2 - tt1), 3) as processing_time from t0 group by machine_id;法三:case when巧妙算法
①计算processing_time等于
sum(同一machine_id,process_id的时间戳之差)/count(distinct process_id)
当activity_type为end时 的时间戳 减去 activity_type为start的时间戳
那么根据activity_type进行判断,如果是end那么sum取“+”,如果为start那么sum取“-”
round(sum(case when activity_type = 'end'
then timestamp
else -timestamp end) /
count(distinct process_id),3)
代码:
#法一:
with t1 as
(select *,
max(timestamp) over (partition by machine_id,process_id ) r1
from activity
)
, t2 as(
select machine_id,
process_id,(r1-timestamp) r3
from t1 where activity_type = 'start')
select machine_id,
round(avg(r3),3)processing_time
from t2
group by machine_id;
#法二:
with t0 as (
select t1.machine_id,
t1.timestamp tt1,
t2.timestamp tt2
from (select * from activity where activity_type = 'start')t1,
(select * from activity where activity_type = 'end')t2
where t1.machine_id = t2.machine_id
and t1.process_id = t2.process_id)
select machine_id,round(avg(tt2-tt1),3)as processing_time from t0 group by machine_id
#法三:
select machine_id,round(sum(case when activity_type = 'end' then timestamp else -timestamp end) /
count(distinct process_id),3) processing_time
from activity
group by machine_id;
总结:
①case then解法三很巧妙积累思路
②解法二:拆分为两个表后非常清晰明了,要掌握思想