hive 函数 split(字段名,“ ”) 切分 explode(array) 将数组里内容展开
from (select explode(split(line, ' ')) as word from docs) w
insert into table wc
select word, count(1) as totalword
group by word
order by word; hive版wc
hive动态分区 一组没有分区的数据 要动态的给他加上分区
set hive.exec.dynamic.partition.mode=nostrict; (设置这个可以不用有一个是静态分区)
from tb_user2
insert overwrite table tb_user1 partition (age, sex)
select id, name, likes, addrs, age, sex distribute by age, sex;
如果没设置上面的参数
hive> from tb_user1
> insert overwrite table tb_user partition(age=20,sex)
> select id, name, likes, addrs, sex distribute by sex;
分桶操作 (适用于数据抽样)
clustered by (id) into 4 buckets 创建表时 添加 查询时select * from tb_user2 tablesample(bucket 3 out of 4 on id);
如果在创建时是32 在1 out of 16 意思是32个痛分两部分一部分16 1则是每一部分的第一个桶内的数据
lateral view 可以是select 中带有多个udtf函数 (explode)
select count(distinct(myCol1)), count(distinct(myCol2)) from psn2
LATERAL VIEW explode(likes) myTable1 AS myCol1
LATERAL VIEW explode(address) myTable2 AS myCol2, myCol3;
视图 create view
索引
create index t1_index on table tb_user2(name)
as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' with deferred rebuild
in table t1_index_table; in table 后是索引表名字
show index on tb_user2;
drop index t1_index on tb_user2;
hive优化(未完)