0.hive 去重
select * from (select *,row_number() over (partition by id order by name asc) num from table) t where t.num=1;
id ,name 为列名
0 hive排序
select pin,pro,row_number() over (distribute by pin sort by pro desc) rownum from cf_dev.sy_CAREER_source_pro;
1.^A 的实际编码为\001
2.having子句
having 可以对 group by 产生的分组进行条件过滤
例:
select keyword,count(*) from cf_dev.sy_corporation_ad group by keyword having count(*) > 1000;
3.join的优化
多表查询应保证从左到右的顺序为由小到大。
join 内连接 只显示两表相同的部分
left outer join 以左表为主,右表对应部分显示NULL
select a.pin,b.address from a join b on a.pin = b.pin
distribute by 控制map的输出在reduce中如何划分。
order by 控制reduce的输出排序
4.union all 多表合并
select * from XXX
union all
select * from XXX
5.随机查询
select * from tablename TABLESAMPLE(bucket 3 out of 10 on rand()) s;
6.改表名
alter table tablename RENAME INTO new_talbename;