声明:如果你是初学者,看我这篇文章的时候,看我上一篇会更好。
Hive表的创建:http://blog.csdn.net/qq_29622761/article/details/51564680
这篇的主要内容目录是:
- 由一个表创建另一个表
- hive不同文件读取对比
- hive分区表
- hive分桶
你现在开始吧!
- 由一个表创建另一个表
格式:ceate table test3 like test2;
我要做的:create table testtext_c like testtext;
(这种方式不会把数据复制过来,只是创建了相同的数据格式)
我先加载数据到表testtext中:
[root@hadoop1 host]# cat testtext
wer 46
wer 89
weree 78
rr 89
hive> load data local inpath '/usr/host/testtext' into table testtext;
Copying data from file:/usr/host/testtext
Copying file: file:/usr/host/testtext
Loading data to table default.testtext
OK
Time taken: 0.294 seconds
hive> select * from testtext;
OK
wer 46
wer 89
weree 78
rr 89
Time taken: 0.186 seconds
hive>
2 接着创建testtext_c吧(like方式)
hive> create table testtext_c like testtext;
OK
Time taken: 0.181 seconds
hive> select * from testtext;
OK
wer 46
wer 89
weree 78
rr 89
Time taken: 0.204 seconds
hive> select * from testtext_c;
OK
Time taken: 0.158 seconds
hive>
哎,testtext_c中确实没有数据吧!真的没骗你啊!
3 客官,别急,还有一种方式(as)
hive> create table testtext_cc as select name,addr from testtext;
Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
Hadoop job information for null: number of mappers: 1; number of reducers: 0
2016-06-01 20:49:59,404 null map = 0%, reduce = 0%
2016-06-01 20:50:20,644 null map = 100%, reduce = 0%, Cumulative CPU 1.3 sec
2016-06-01 20:50:21,735 null map = 100%, reduce = 0%, Cumulative CPU 1.3 sec
MapReduce Total cumulative CPU time: 1 seconds 300 msec
Ended Job = job_1464828076391_0004
Execution completed successfully
Mapred Local Task Succeeded . Convert the Join into MapJoin
Ended Job = 1011778050, job is filtered out (removed at runtime).
Moving data to: hdfs://hadoop1:9000/tmp/hive-root/hive_2016-06-01_20-49-43_516_5205177189363939745/-ext-10001
Moving data to: hdfs://hadoop1:9000/user/hive/warehouse/testtext_cc
Table default.testtext_cc stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 29, raw_data_size: 0]
OK
Time taken: 48.014 seconds
又跑mapreduce,为啥?create table testtext_c like testtext;这个都不走mapreduce的啊!怎么这里就跑mapreduce?嘿嘿,其实这里有select关键字,只有select * from 啥的不走mapreduce,其余的select都是会跑mapreduce的,hive的底层设计原理其实就是走mapreduce的,不信你看看我前一篇博客。
查查有没有数据:
hive> select * from testtext_cc;
OK
wer 46
wer 89
weree 78
rr 89
Time taken: 0.116 seconds
hive>
有啦有啦!
所以:create table testtext_cc as select name,addr from testtext;
(这一种方式是走mapreduce形式,这种方式是把数据也会复制过来)
4 接下来呢,看看不同文件格式读取对比
有textfile文件格式,sequencefile格式,rcfile格式,还有自定义的文件格式。
hive> create table test_text(name string,val string) stored as textfile;
OK
Time taken: 0.098 seconds
hive> desc formatted test_text;
OK
# col_name data_type comment
name string None
val string None
# Detailed Table Information
Database: default
Owner: root
CreateTime: Wed Jun 01 21:11:15 PDT 2016
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://hadoop1:9000/user/hive/warehouse/test_text
Table Type: MANAGED_TABL