视图-view(虚表) 降低查询的复杂度
创建视图 view
create view view1 as select c.id cid,c.name,c.age,o.id oid, o.ordernum,o.price from customers c left outer join orders o on c.id = o.id;
对视图进行查询
hive (hive1)> select * from view1;
view1.cid view1.name view1.age view1.oid view1.ordernum view1.price
1 tom1 12 1 No001 121.01
2 tom2 13 2 No002 121.02
3 tom3 14 3 No003 121.03
Time taken: 29.558 seconds, Fetched: 3 row(s)
查询出 price > 121.02;
hive (hive1)> select * from view1 where price > 121.02;
view1.cid view1.name view1.age view1.oid view1.ordernum view1.price
3 tom3 14 3 No003 121.03
Time taken: 28.739 seconds, Fetched: 1 row(s)
使用like方式创建table
hive (hive1)> create table test5 like view1;
OK
Time taken: 0.076 seconds
hive (hive1)> select * from test5;
OK
test5.cid test5.name test5.age test5.oid test5.ordernum test5.price
Time taken: 0.087 seconds
删除试图
hive (hive1)> drop view view1;
OK
Time taken: 0.105 seconds
索引(目录)---->hive没有key(primark key +autot_increment)的概念
创建索引
create index idx_customers_id on table customers (id) as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' with deferred rebuild idxproperties ('creator'='jhy') in table customers_index comment 'this is a comment';
OK
Time taken: 0.211 seconds
重建索引,产生索引表(hdfs文件)
hive (hive1)> alter index idx_customers_id on customers rebuild;
hive (hive1)> dfs -text /user/hive/warehouse/hive1.db/customers_index/000000_0;
1hdfs://master:9000/user/hive/warehouse/hive1.db/customers/customers.txt0
2hdfs://master:9000/user/hive/warehouse/hive1.db/customers/customers.txt10
3hdfs://master:9000/user/hive/warehouse/hive1.db/customers/customers.txt20
创建订单索引
hive (hive1)> create index idx_orders_ordernum on table orders (ordernum) as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' with deferred rebuild in table orders_index ;
OK
Time taken: 0.221 seconds
重建索引,产生索引表(hdfs文件)
hive (hive1)> alter index idx_orders_ordernum on orders rebuild;
查询产生的索引表
hive (hive1)> dfs -text /user/hive/warehouse/hive1.db/orders_index/000000_0;
No001hdfs://master:9000/user/hive/warehouse/hive1.db/orders/orders.txt0
No002hdfs://master:9000/user/hive/warehouse/hive1.db/orders/orders.txt17
No003hdfs://master:9000/user/hive/warehouse/hive1.db/orders/orders.txt34
No004hdfs://master:9000/user/hive/warehouse/hive1.db/orders/orders.txt51
No005hdfs://master:9000/user/hive/warehouse/hive1.db/orders/orders.txt68
No006hdfs://master:9000/user/hive/warehouse/hive1.db/orders/orders.txt85
No007hdfs://master:9000/user/hive/warehouse/hive1.db/orders/orders.txt102
查询ordernum=No001 ;
hive (hive1)> select * from orders where ordernum='No001';
OK
orders.id orders.ordernum orders.price orders.cid
1 No001 121.01 1
Time taken: 0.085 seconds, Fetched: 1 row(s)
删除索引
hive (hive1)> drop index idx_customers_id on customers;
OK
Time taken: 0.213 seconds
hive (hive1)> drop index idx_orders_ordernum on orders;
OK
Time taken: 0.126 seconds
分区是路径,是目录,是文件逻辑隔离,有效降低查询量
桶表(bucket) 是文件
准备数据
orderitems.txt
1 item1 1
2 item2 1
3 item3 1
4 item4 2
5 item5 2
6 item6 2
7 item7 3
8 item8 3
9 item9 3
10 item10 3
11 item10 4
12 item10 4
13 item10 4
14 item10 4
15 item10 4
创建桶表
hive (hive1)> create table orderitems (id int,itemanme string,oid int) clustered by (oid) into 3 buckets row format delimited fields TERMINATED by '\t' lines TERMINATED by '\n' stored as textfile;
OK
Time taken: 0.08 seconds
插入数据
insert into orderitems(id,itemanme,oid) values(1,'item1',1);
insert into orderitems(id,itemanme,oid) values(2,'item1',2);
insert into orderitems(id,itemanme,oid) values(3,'item1',3);
insert into orderitems(id,itemanme,oid) values(4,'item1',4);
查看hdfs中生成的目录
hive (hive1)> dfs -lsr /user/hive/warehouse/hive1.db/orderitems/;
lsr: DEPRECATED: Please use 'ls -R' instead.
-rwxrwxrwx 3 hadoop supergroup 0 2018-07-01 17:32 /user/hive/warehouse/hive1.db/orderitems/000000_0
-rwxrwxrwx 3 hadoop supergroup 0 2018-07-01 17:33 /user/hive/warehouse/hive1.db/orderitems/000000_0_copy_1
-rwxrwxrwx 3 hadoop supergroup 10 2018-07-01 17:34 /user/hive/warehouse/hive1.db/orderitems/000000_0_copy_2
-rwxrwxrwx 3 hadoop supergroup 0 2018-07-01 17:34 /user/hive/warehouse/hive1.db/orderitems/000000_0_copy_3
-rwxrwxrwx 3 hadoop supergroup 10 2018-07-01 17:32 /user/hive/warehouse/hive1.db/orderitems/000001_0
-rwxrwxrwx 3 hadoop supergroup 0 2018-07-01 17:33 /user/hive/warehouse/hive1.db/orderitems/000001_0_copy_1
-rwxrwxrwx 3 hadoop supergroup 0 2018-07-01 17:34 /user/hive/warehouse/hive1.db/orderitems/000001_0_copy_2
-rwxrwxrwx 3 hadoop supergroup 10 2018-07-01 17:34 /user/hive/warehouse/hive1.db/orderitems/000001_0_copy_3
-rwxrwxrwx 3 hadoop supergroup 0 2018-07-01 17:32 /user/hive/warehouse/hive1.db/orderitems/000002_0
-rwxrwxrwx 3 hadoop supergroup 10 2018-07-01 17:33 /user/hive/warehouse/hive1.db/orderitems/000002_0_copy_1
-rwxrwxrwx 3 hadoop supergroup 0 2018-07-01 17:34 /user/hive/warehouse/hive1.db/orderitems/000002_0_copy_2
-rwxrwxrwx 3 hadoop supergroup 0 2018-07-01 17:34 /user/hive/warehouse/hive1.db/orderitems/000002_0_copy_3
查询桶表(orderitems)
hive (hive1)> select * from orderitems;
OK
orderitems.id orderitems.itemanme orderitems.oid
3 item1 3
1 item1 1
4 item1 4
2 item1 2
Time taken: 0.059 seconds, Fetched: 4 row(s)
进行条件查询
hive (hive1)> select * from orderitems where oid=4;
OK
orderitems.id orderitems.itemanme orderitems.oid
4 item1 4
Time taken: 0.059 seconds, Fetched: 1 row(s)
对hdfs中的数据进行查看
hive (hive1)> dfs -text /user/hive/warehouse/hive1.db/orderitems/000001_0_copy_3
> ;
4 item1 4
hive (hive1)> dfs -text /user/hive/warehouse/hive1.db/orderitems/000000_0_copy_2;
3 item1 3
hive (hive1)> dfs -text /user/hive/warehouse/hive1.db/orderitems/000002_0_copy_1;
2 item1 2
hive (hive1)> dfs -text /user/hive/warehouse/hive1.db/orderitems/000001_0;
1 item1 1