hive创建四种表
内部表、分区表、桶表、外部表
1、 内部表:
待上传的内容:
创建一个tb1表,两个字段之间用Tab键隔开:
执行命令:CREATE TABLE tb1(name string,id int) ROW FORMATDELIMITED FIELDS TERMINATED BY '\t';
hive> CREATE TABLE tb1(name string,idint) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
OK
Time taken: 8.016 seconds
hive>
加载数据到tb1表中:
执行命令:LOAD DATA LOCAL INPATH '/home/hadoop/hive' INTOTABLE tb1;
hive> LOAD DATA LOCAL INPATH'/home/hadoop/hive' INTO TABLE tb1;
Copying data from file:/home/hadoop/hive
Copying file: file:/home/hadoop/hive
Loading data to table default.tb1
OK
Time taken: 9.795 seconds
hive>
查询表中数据:
hive> select * from tb1;
OK
hadoop 1
hbase 2
hive 3
java 4
php 5
mahout 6
acm 7
Time taken: 2.871 seconds
hive>
网页数据:
2、分区表:
分区表用文件夹划分:
执行命令:CREATE TABLE tb2(id int) PARTITIONED BY (day int);
hive> CREATE TABLE tb2(id int)PARTITIONED BY (day int);
OK
Time taken: 0.548 seconds
hive>
往表中加载数据:
执行命令:LOAD DATA LOCAL INPATH '/home/hadoop/in' INTOTABLE tb2 PARTITION (day='3');
hive> LOAD DATA LOCAL INPATH'/home/hadoop/in' INTO TABLE tb2 PARTITION (day='3');
Copying data from file:/home/hadoop/in
Copying file: file:/home/hadoop/in
Loading data to table default.tb2 partition(day=3)
OK
Time taken: 4.64 seconds
hive> LOAD DATA LOCAL INPATH'/home/hadoop/in' INTO TABLE tb2 PARTITION (day='4');
Copying data from file:/home/hadoop/in
Copying file: file:/home/hadoop/in
Loading data to table default.tb2 partition(day=4)
OK
Time taken: 0.411 seconds
hive> LOAD DATA LOCAL INPATH'/home/hadoop/in' INTO TABLE tb2 PARTITION (day='5');
Copying data from file:/home/hadoop/in
Copying file: file:/home/hadoop/in
Loading data to table default.tb2 partition(day=5)
OK
Time taken: 0.516 seconds
hive>
执行命令:select * from tb2 where day=3;
查看day=3的值:
hive> select * from tb2 where day=3;
OK
1 3
12 3
123 3
1234 3
12345 3
Time taken: 2.062 seconds
hive>
网页:
点击day=3进去查看内容:
3、桶表:
先创建一个内部表,并把数据加载进去:
hive> CREATE TABLE t0(id int);
OK
Time taken: 10.418 seconds
hive> LOAD DATA LOCAL INPATH'/home/hadoop/in' INTO TABLE t0;
Copying data from file:/home/hadoop/in
Copying file: file:/home/hadoop/in
Loading data to table default.t0
OK
Time taken: 1.145 seconds
hive>
Hive的sql会转换为mapreduce的job来运行:
hive> select id from t0;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 sincethere's no reduce operator
Starting Job = job_201503032013_0004,Tracking URL = http://baolibin:50030/jobdetails.jsp?jobid=job_201503032013_0004
Kill Command =/usr/hadoop/libexec/../bin/hadoop job -Dmapred.job.tracker=192.168.1.100:9001 -kill job_201503032013_0004
Hadoop job information for Stage-1: numberof mappers: 1; number of reducers: 0
2015-03-03 20:32:38,057 Stage-1 map =0%, reduce = 0%
2015-03-03 20:33:06,242 Stage-1 map =100%, reduce = 0%, Cumulative CPU 2.4sec
2015-03-03 20:33:07,790 Stage-1 map =100%, reduce = 0%, Cumulative CPU 2.4sec
2015-03-03 20:33:08,849 Stage-1 map =100%, reduce = 0%, Cumulative CPU 2.4sec
2015-03-03 20:33:09,953 Stage-1 map =100%, reduce = 0%, Cumulative CPU 2.4sec
2015-03-03 20:33:10,989 Stage-1 map =100%, reduce = 0%, Cumulative CPU 2.4sec
2015-03-03 20:33:12,004 Stage-1 map =100%, reduce = 0%, Cumulative CPU 2.4sec
2015-03-03 20:33:13,155 Stage-1 map =100%, reduce = 0%, Cumulative CPU 2.4sec
2015-03-03 20:33:14,165 Stage-1 map =100%, reduce = 0%, Cumulative CPU 2.4sec
2015-03-03 20:33:15,175 Stage-1 map =100%, reduce = 0%, Cumulative CPU 2.4sec
2015-03-03 20:33:16,202 Stage-1 map =100%, reduce = 0%, Cumulative CPU 2.4sec
2015-03-03 20:33:17,274 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.4 sec
2015-03-03 20:33:18,304 Stage-1 map =100%, reduce = 0%, Cumulative CPU 2.4sec
2015-03-03 20:33:19,715 Stage-1 map =100%, reduce = 100%, Cumulative CPU 2.4sec
MapReduce Total cumulative CPU time: 2seconds 400 msec
Ended Job = job_201503032013_0004
MapReduce Jobs Launched:
Job 0: Map: 1 Cumulative CPU: 2.4 sec HDFS Read: 210 HDFS Write: 20 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds400 msec
OK
1
12
123
1234
12345
Time taken: 84.963 seconds
hive>
用文件划分:
用的时候很少,在表连接时候用:
执行命令:create table tb3(id int) clustered by(id) into 4 buckets;
hive> create table tb3(id int) clusteredby(id) into 4 buckets;
OK
Time taken: 0.34 seconds
hive>
启用桶:
set hive.enforce.bucketing = true;
hive> set hive.enforce.bucketing = true;
hive>
加载进去要用mapreduce计算,LOAD不用mapreduce计算,所以LOAD加载不进去:
insert into table tb3 select id from t0;
可以看到hive的sql会装换为mapreduce的job运行:
hive> insert into table tb3 select id from t0;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined atcompile time: 4
In order to change the average load for areducer (in bytes):
sethive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number ofreducers:
sethive.exec.reducers.max=<number>
In order to set a constant number ofreducers:
setmapred.reduce.tasks=<number>
Starting Job = job_201503032013_0005,Tracking URL = http://baolibin:50030/jobdetails.jsp ?jobid=job_201503032013_0005
Kill Command = /usr/hadoop/libexec/../bin/hadoopjob -Dmapred.job.tracker=192.168.1.100: 9001 -kill job_201503032013_0005
Hadoop job information for Stage-1: numberof mappers: 1; number of reducers: 4
2015-03-03 20:53:49,001 Stage-1 map =0%, reduce = 0%
2015-03-03 20:55:37,605 Stage-1 map =0%, reduce = 0%
2015-03-03 20:56:31,664 Stage-1 map =100%, reduce = 0%, Cumulative CPU 2.79sec
2015-03-03 20:56:35,394 Stage-1 map =100%, reduce = 0%, Cumulative CPU 2.79sec
2015-03-03 20:56:38,290 Stage-1 map =100%, reduce = 0%, Cumulative CPU 2.79sec
2015-03-03 20:56:40,714 Stage-1 map =100%, reduce = 0%, Cumulative CPU 2.79sec
2015-03-03 20:56:43,278 Stage-1 map =100%, reduce = 0%, Cumulative CPU 2.79sec
2015-03-03 20:56:45,918 Stage-1 map =100%, reduce = 0%, Cumulative CPU 2.79sec
2015-03-03 20:56:48,653 Stage-1 map =100%, reduce = 0%, Cumulative CPU 2.79sec
2015-03-03 20:56:50,109 Stage-1 map =100%, reduce = 0%, Cumulative CPU 2.79sec
2015-03-03 20:56:51,591 Stage-1 map =100%, reduce = 0%, Cumulative CPU 2.79sec
2015-03-03 20:56:55,125 Stage-1 map =100%, reduce = 0%, Cumulative CPU 2.79sec
2015-03-03 20:56:58,906 Stage-1 map =100%, reduce = 0%, Cumulative CPU 2.79sec
2015-03-03 20:57:01,838 Stage-1 map =100%, reduce = 0%, Cumulative CPU 2.79sec
2015-03-03 20:57:04,805 Stage-1 map =100%, reduce = 0%, Cumulative CPU 2.79sec
2015-03-03 20:57:06,687 Stage-1 map =100%, reduce = 0%, Cumulative CPU 2.79sec
2015-03-03 20:57:08,859 Stage-1 map =100%, reduce = 0%, Cumulative CPU 2.79sec
2015-03-03 20:57:11,470 Stage-1 map =100%, reduce = 0%, Cumulative CPU 2.79sec
2015-03-03 20:57:31,707 Stage-1 map =100%, reduce = 33%, Cumulative CPU 2.79sec
2015-03-03 20:58:12,864 Stage-1 map =100%, reduce = 50%, Cumulative CPU 5.98sec
2015-03-03 20:58:28,741 Stage-1 map =100%, reduce = 50%, Cumulative CPU 5.98sec
2015-03-03 20:58:34,581 Stage-1 map =100%, reduce = 50%, Cumulative CPU 5.98sec
2015-03-03 20:58:44,953 Stage-1 map =100%, reduce = 50%, Cumulative CPU 7.68sec
2015-03-03 20:58:52,075 Stage-1 map =100%, reduce = 50%, Cumulative CPU 7.68sec
2015-03-03 20:58:58,370 Stage-1 map =100%, reduce = 50%, Cumulative CPU 7.68sec
2015-03-03 20:59:07,524 Stage-1 map =100%, reduce = 50%, Cumulative CPU 7.68sec
2015-03-03 20:59:18,044 Stage-1 map =100%, reduce = 50%, Cumulative CPU 7.68sec
2015-03-03 20:59:38,184 Stage-1 map =100%, reduce = 50%, Cumulative CPU 7.68sec
2015-03-03 21:00:10,233 Stage-1 map =100%, reduce = 50%, Cumulative CPU 8.55sec
2015-03-03 21:00:36,173 Stage-1 map =100%, reduce = 50%, Cumulative CPU 8.55sec
2015-03-03 21:00:46,490 Stage-1 map =100%, reduce = 50%, Cumulative CPU 8.55sec
2015-03-03 21:00:51,949 Stage-1 map =100%, reduce = 50%, Cumulative CPU 10.06sec
2015-03-03 21:00:55,549 Stage-1 map =100%, reduce = 50%, Cumulative CPU 10.06sec
2015-03-03 21:00:57,653 Stage-1 map =100%, reduce = 50%, Cumulative CPU 10.06sec
2015-03-03 21:00:59,096 Stage-1 map =100%, reduce = 50%, Cumulative CPU 10.06sec
2015-03-03 21:01:01,123 Stage-1 map =100%, reduce = 50%, Cumulative CPU 10.06sec
2015-03-03 21:01:05,167 Stage-1 map =100%, reduce = 50%, Cumulative CPU 10.06sec
2015-03-03 21:01:12,514 Stage-1 map =100%, reduce = 50%, Cumulative CPU 10.06sec
2015-03-03 21:01:15,956 Stage-1 map =100%, reduce = 50%, Cumulative CPU 10.06sec
2015-03-03 21:01:17,942 Stage-1 map =100%, reduce = 50%, Cumulative CPU 10.06sec
2015-03-03 21:01:21,971 Stage-1 map =100%, reduce = 50%, Cumulative CPU 10.06sec
2015-03-03 21:01:24,931 Stage-1 map =100%, reduce = 50%, Cumulative CPU 10.06sec
2015-03-03 21:01:27,857 Stage-1 map =100%, reduce = 50%, Cumulative CPU 10.06sec
2015-03-03 21:01:36,741 Stage-1 map =100%, reduce = 50%, Cumulative CPU 10.06sec
2015-03-03 21:01:41,436 Stage-1 map =100%, reduce = 50%, Cumulative CPU 10.06sec
2015-03-03 21:01:50,578 Stage-1 map =100%, reduce = 50%, Cumulative CPU 10.06sec
2015-03-03 21:02:25,276 Stage-1 map =100%, reduce = 50%, Cumulative CPU 10.06sec
2015-03-03 21:02:28,493 Stage-1 map =100%, reduce = 50%, Cumulative CPU 10.06sec
2015-03-03 21:02:31,903 Stage-1 map =100%, reduce = 50%, Cumulative CPU 10.06sec
2015-03-03 21:02:33,695 Stage-1 map =100%, reduce = 50%, Cumulative CPU 10.06sec
2015-03-03 21:02:50,396 Stage-1 map =100%, reduce = 67%, Cumulative CPU 10.06sec
2015-03-03 21:03:20,686 Stage-1 map =100%, reduce = 83%, Cumulative CPU 12.98sec
2015-03-03 21:03:34,343 Stage-1 map =100%, reduce = 100%, Cumulative CPU12.98 sec
2015-03-03 21:04:16,332 Stage-1 map =100%, reduce = 100%, Cumulative CPU12.98 sec
2015-03-03 21:05:03,443 Stage-1 map =100%, reduce = 100%, Cumulative CPU16.15 sec
2015-03-03 21:05:17,439 Stage-1 map =100%, reduce = 100%, Cumulative CPU17.59 sec
2015-03-03 21:05:20,714 Stage-1 map =100%, reduce = 100%, Cumulative CPU17.59 sec
2015-03-03 21:05:22,186 Stage-1 map =100%, reduce = 100%, Cumulative CPU17.59 sec
2015-03-03 21:05:24,029 Stage-1 map =100%, reduce = 100%, Cumulative CPU17.59 sec
2015-03-03 21:05:26,798 Stage-1 map =100%, reduce = 100%, Cumulative CPU17.59 sec
2015-03-03 21:05:28,645 Stage-1 map =100%, reduce = 100%, Cumulative CPU17.59 sec
2015-03-03 21:05:30,733 Stage-1 map =100%, reduce = 100%, Cumulative CPU17.59 sec
2015-03-03 21:05:33,056 Stage-1 map =100%, reduce = 100%, Cumulative CPU17.59 sec
2015-03-03 21:05:38,680 Stage-1 map =100%, reduce = 100%, Cumulative CPU17.59 sec
MapReduce Total cumulative CPU time: 17seconds 590 msec
Ended Job = job_201503032013_0005
Loading data to table default.d2
Table default.d2 stats: [num_partitions: 0,num_files: 4, num_rows: 0, total_size: 40, raw_data_size: 0]
11 Rows loaded to d2
MapReduce Jobs Launched:
Job 0: Map: 1 Reduce: 4 Cumulative CPU: 17.59 sec HDFSRead: 229 HDFS Write: 40 SUCCESS
Total MapReduce CPU Time Spent: 17 seconds590 msec
OK
Time taken: 873.793 seconds
hive>
网页:
查看具体值:
4、外部表:
执行命令:hadoop fs-put /home/hadoop/in /external/in
[root@baolibin ~]# hadoop fs -put/home/hadoop/in /external/in
Warning: $HADOOP_HOME is deprecated.
[root@baolibin ~]#
不用加载,自己就指定了数据在哪:
执行命令: create externaltable tb4(id int) location '/external';
hive> create external table tb4(id int)location '/external';
OK
Time taken: 18.45 seconds
hive>
drop table tb4; 只删除表定义,数据文件还在
内部表、分区表、桶表都是 托管表,执行删除表命令,表定义和数据都被删除了。