背景:做AB testing, 需要分流出固定百分比70%(但数量不固定)的用户跑二级风控模型
问题:查了几个函数,都不能满足要求,比如ntile, tablesample(n percent) (这个是按size分的,不是行数)
解决:
with temp1 as (
select ceil(count(*)*0.7) as cnt
from report.report_mxdai_riskresult_backwid_sh
where topichour="{{ymdh}}"
), temp2 as (
select
*
,row_number() over(order by rand()) as rn
,t1.cnt
from report.report_mxdai_riskresult_backwid_sh t
left join temp1 t1 on (1=1 )
where topichour="{{ymdh}}"
)
select
id
,wid
,case when rn <= cnt then result
else 1
end as result
,case when rn <= cnt then model_version
else 'v0'
end as model_version
,create_time_mysql
,from_unixtime(unix_timestamp(),'yyyy-MM-dd HH:mm:ss') as create_time_hive
from temp2