记录随机抽样方法:
hive:
select *col* from *table* where *condition* distribute by rand() sort by rand() limit *number*
impala:
select *col* from *table* where *condition* order by rand() limit *number*;
-- 建立随机id然后按id排序
select col,uuid() from table where condition order by 2 limit number
网上很多是hive 排序:
hive三种抽样方法
hive随机抽样方法优化