一面
项目难点
数据分层
怎么判断一个数仓的质量
拉链表
SQL题
一张device表
device_id | country uid | extend | date_time |
---|---|---|---|
{spu_code click pay_id time pv} |
1、当前spu在某些国家下的曝光情况 汇总
2、14天内某些spu点击 曝光 购买趋势 top10 汇总
3、当前u_ID 是否活跃用户(连续7天有曝光行为)
with device2 as(
select
device_id
,country
,uid
,date_time
,get_json(extend,spu_code) as spu_code
,get_json(extend,click) as click
,get_json(extend,pay_id) as pay_id
,get_json(extend,time) as time
,get_json(extend,pv) as pv
from device
)
1.
select country,spu_code,sum(pv)
from device2
group by country,spu_code
2.
select spu_code
from(
select spu_code,
rank() over(partition by spu_code order by sum_click) as click_top,
rank() over(partition by spu_code order by sum_pv) as pv_top,
rank() over(partition by spu_code order by sum_paid) as paid_top,
from(
select spu_code,sum(click),sum(pv),sum(paid_id)
from device2
where date_time>=date_sub(date_time,14)
group by spu_code
)t
)t2
where click_top>=10 or pv_top>=10 or paid_top>=10
3.
select u_id,if(count(time_num)>=7,1,0) as if_create
from(
select uid,date_sub(date_time,rn) as time_num
from(
select date_time,uid,row_number() over(partition by u_id order by date_time) as rn
from device2
)t
)t2
二面
数据倾斜
缓慢变化维
维度退化
项目难点
三名(CTO面)
学校情况
实习难点
反问:
和SHEIN比优势