Big Data learning - HIVE - bucket, partition, index

Partition

Partition stored in table layout as char. It can be queried with command describe table, It only store the partition information instead of any pysical data.

1. static partition table

create table if not exists hive1.test1
(id int,name string,tel string) 
partitioned by(age int) 
row format delimited fields 
terminated by ',' 
stored as textfile;

insert (overwrite) into table hive1.test1 partition(age='25');

select id,name,tel from hive1.test1;

2. dynamic partition table

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table hive1.test1 partition(age);
select id,name,tel,age from hive1.test1;

Bucket

CREATE TABLE hive1.test2 (id INT, name STRING) CLUSTERED BY (id) INTO 4 BUCKETS;

For each table or partition table, hive can orgnize it into bucket further. hash key dividing bucket amounts decide which bucket the data stored in.

Reason for orgnizing table into bucket

  1. It can provide more efficient query. Hive can handle the query with the specially structure provided by bucket. Hive can use Map-side-join to achieve high effciency, for example, it can be joined with same hash key between two tables while we have buckets on them.
  2. It brings better performance while we fetch a sample data

Index
index can bring a faster query speed while using group by

create index employees_index on table employees(country)
as  'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
with deferred rebuild
in table employees_index_table;

Different between bucket, partition and index

1. Index and partition

Index won’t split files, partition will split files.
Index is one thing that using extra disk space exchanged query cost.
Partition will split one big table into different files.

2. Partition and bucket
Bucket will split table randomly but can spread the data on average.
Partition will split table with specific keys so it could cause the data skew。

Bucket will split table into different files but in only one folder. Partition will split table into different folders.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值