数据分析师笔试试卷四:SQL——视频(KS)

欢迎您参加数据分析笔试考试,针对该岗位,本次笔试满分为 100 分,考试时间:60min
要求:本次笔试不能外部查询,需独立完成

现有一张表:jf.creative_element_di
在这里插入图片描述
在这里插入图片描述

1、 取出2024-07-02日17点的视频入队事件的量级 (15分)

SELECT COUNT(*)
FROM jf.creative_element_di
WHERE DATE_FORMAT(enqueue_time, '%Y-%m-%d %H') = '2024-07-02 17' and element_type=2024518 and  event_type=’ENQUEUE’;

2、 取出封面中所有的创意id,并且每行只能有1个创意id (20分)

SELECT DISTINCT JSON_UNQUOTE(JSON_EXTRACT(params, '$.extraInfo2.creative_id')) AS creative_id
FROM jf.creative_element_di
WHERE element_type = '2024519';

3、 取出相同广告主所有视频的延时时间,都在4H内的广告主量级(注:①延时时间=提交时间-入队时间) (20分)

WITH video_times AS (
    SELECT video_id,
           account_id AS advertiser_id,
           MAX(CASE WHEN event_type = 'ENQUEUE' THEN operation_time END) AS enqueue_time,
           MAX(CASE WHEN event_type = 'SUBMIT' THEN operation_time END) AS submit_time
    FROM jf.creative_element_di
    WHERE element_type = '2024518'
    GROUP BY video_id, account_id
),
account_delays AS (
    SELECT account_id,
           TIMESTAMPDIFF(HOUR, enqueue_time, submit_time) AS delay_hours
    FROM video_times
    WHERE enqueue_time IS NOT NULL AND submit_time IS NOT NULL
)
SELECT COUNT(DISTINCT account_id)
FROM account_delays
GROUP BY account_id
HAVING delay_hours <= 4; 

4、现在要从总量中抽出1000条提交事件的视频数据,要求每个审核员id的抽取比例要相近 (注:抽取比例=抽取量/总量) (20分,考点:逻辑*)
a) 确定审核员 ID 的分布情况:首先,了解每个审核员 ID 在总量中的出现次数,即每个审核员 ID 被分配的视频数量。
b) 计算每个审核员 ID 的抽取数量: 计算每个审核员 ID 应该抽取的数量,使得抽取比例接近相等。假设总共有 N 条视频数据,总共有 M 个不同的审核员 ID,每个审核员 ID 在总量中的数量为 n_i,则抽取数量为 floor(1000 * n_i/N),其中 floor函数表示向下取整。
c) 处理抽取数量不足或超出的情况:由于每个审核员 ID 的抽取数量是向下取整的,可能会出现抽取数量不足或超出的情况。可以按以下方式处理:
i. 计算实际抽取总量 floor(1000 * n_i/N)之和。
ii. 如果实际抽取总量小于 1000,则根据差额补足,可以按照抽取比例的大小排序,依次补充直到总量达到 1000。
iii. 如果实际抽取总量超过 1000,则从抽取数量最多的审核员 ID 开始减少,直到总量减少到 1000。

      WITH total_submissions AS (
    SELECT operator_id, 
           COUNT(*) AS total_count
    FROM jf.creative_element_di
    WHERE event_type = 'SUBMIT' AND element_type = '2024518'
    GROUP BY operator_id
),
sampling_ratios AS (
    SELECT operator_id, 
           total_count,
           (total_count / (SELECT COUNT(*) FROM jf.creative_element_di WHERE event_type = 'SUBMIT' AND element_type = '2024518')) AS ratio
    FROM total_submissions
),
ranked_submissions AS (
    SELECT d.*, 
           ROW_NUMBER() OVER (PARTITION BY d.operator_id ORDER BY d.operation_time) AS row_num,
           FLOOR(s.ratio * 1000) AS sample_count
    FROM jf.creative_element_di d
    JOIN sampling_ratios s ON d.operator_id = s.operator_id
    WHERE d.event_type = 'SUBMIT' AND d.element_type = '2024518'
)
SELECT * FROM ranked_submissions
WHERE row_num <= sample_count
ORDER BY operator_id, row_num
LIMIT 1000;

4、 一个审核员在同一时间开了2个页面,审核时间重复计算的部分称为多开时间(eg:x代表领取时间,y代表出队时间;x2~y1之间重复计算的时间就是多开)求每个审核员剔除多开后的时间(注:审核时间=提交时间-领取时间) (25分,考点:逻辑***)

---提取每个审核员的所有 ENQUEUE 和 SUBMIT 事件,并按审核员和操作时间排序
WITH event_times AS (
    SELECT operator_id,
           operation_time,
           LEAD(operation_time) OVER (PARTITION BY operator_id ORDER BY operation_time) AS next_submit_time,   ---下一个时间点
           event_type
    FROM jf.creative_element_di
    WHERE (event_type = 'ENQUEUE' OR event_type = 'SUBMIT')
),
---根据 ENQUEUE 和 SUBMIT 事件时间创建审核时间段
review_intervals AS (
    SELECT operator_id,
           MIN(CASE WHEN event_type = 'ENQUEUE' THEN operation_time END) AS enqueue_time,
           MAX(CASE WHEN event_type = 'SUBMIT' THEN operation_time END) AS submit_time
    FROM event_times
    GROUP BY operator_id, next_submit_time
),
---检查并调整重叠的时间段,避免重复计算
merged_intervals AS (
    SELECT operator_id, enqueue_time, submit_time,
           CASE
               WHEN LAG(submit_time) OVER (PARTITION BY operator_id ORDER BY enqueue_time) > enqueue_time   ---上一个时间点
               THEN LAG(submit_time) OVER (PARTITION BY operator_id ORDER BY enqueue_time)
               ELSE enqueue_time
           END AS adjusted_enqueue_time
    FROM review_intervals
),
---计算每个审核员剔除多开后的审核时间
review_times AS (
    SELECT operator_id,
           SUM(TIMESTAMPDIFF(SECOND, adjusted_enqueue_time, submit_time)) AS total_review_time_seconds
    FROM merged_intervals
    GROUP BY operator_id
)
SELECT operator_id,
       SEC_TO_TIME(total_review_time_seconds) AS total_review_time   ---转换为时间格式HH:MM:SS
FROM review_times;
 
  • 5
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

rubyw

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值