02-Hive一个表创建另一个表，表分区，分桶

最新推荐文章于 2024-07-29 20:39:23 发布

当法律与事业相遇

最新推荐文章于 2024-07-29 20:39:23 发布

阅读量1.3w

点赞数 1

分类专栏： Hive 文章标签： hive

本文链接：https://blog.csdn.net/qq_29622761/article/details/51565546

版权

本文介绍了如何使用Hive从一个表创建另一个表，探讨了Hive的不同文件读取格式，详细讲解了分区表的创建与管理，包括添加、删除分区，并讨论了Hive的分桶概念及其优点，最后通过示例展示了分桶表的创建与数据插入。

摘要由CSDN通过智能技术生成

声明：如果你是初学者，看我这篇文章的时候，看我上一篇会更好。
Hive表的创建：http://blog.csdn.net/qq_29622761/article/details/51564680

这篇的主要内容目录是：

由一个表创建另一个表
hive不同文件读取对比
hive分区表
hive分桶

你现在开始吧！

由一个表创建另一个表
格式：ceate table test3 like test2;
我要做的：create table testtext_c like testtext;（这种方式不会把数据复制过来，只是创建了相同的数据格式）
我先加载数据到表testtext中：

[root@hadoop1 host]# cat testtext
wer	46
wer	89
weree	78
rr	89
hive> load data local inpath '/usr/host/testtext' into table testtext;
Copying data from file:/usr/host/testtext
Copying file: file:/usr/host/testtext
Loading data to table default.testtext
OK
Time taken: 0.294 seconds
hive> select * from testtext;
OK
wer	46
wer	89
weree	78
rr	89
Time taken: 0.186 seconds
hive>

2 接着创建testtext_c吧（like方式）

hive> create table testtext_c like testtext;
OK
Time taken: 0.181 seconds
hive> select * from testtext;
OK
wer	46
wer	89
weree	78
rr	89
Time taken: 0.204 seconds
hive> select * from testtext_c;
OK
Time taken: 0.158 seconds
hive>

哎，testtext_c中确实没有数据吧！真的没骗你啊！
3 客官，别急，还有一种方式（as）

hive> create table testtext_cc as select name,addr from testtext;
Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
Hadoop job information for null: number of mappers: 1; number of reducers: 0
2016-06-01 20:49:59,404 null map = 0%,  reduce = 0%
2016-06-01 20:50:20,644 null map = 100%,  reduce = 0%, Cumulative CPU 1.3 sec
2016-06-01 20:50:21,735 null map = 100%,  reduce = 0%, Cumulative CPU 1.3 sec
MapReduce Total cumulative CPU time: 1 seconds 300 msec
Ended Job = job_1464828076391_0004
Execution completed successfully
Mapred Local Task Succeeded . Convert the Join into MapJoin
Ended Job = 1011778050, job is filtered out (removed at runtime).
Moving data to: hdfs://hadoop1:9000/tmp/hive-root/hive_2016-06-01_20-49-43_516_5205177189363939745/-ext-10001
Moving data to: hdfs://hadoop1:9000/user/hive/warehouse/testtext_cc
Table default.testtext_cc stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 29, raw_data_size: 0]
OK
Time taken: 48.014 seconds

又跑mapreduce，为啥？create table testtext_c like testtext;这个都不走mapreduce的啊！怎么这里就跑mapreduce？嘿嘿，其实这里有select关键字，只有select * from 啥的不走mapreduce，其余的select都是会跑mapreduce的，hive的底层设计原理其实就是走mapreduce的，不信你看看我前一篇博客。
查查有没有数据：

hive> select * from testtext_cc;
OK
wer	46
wer	89
weree	78
rr	89
Time taken: 0.116 seconds
hive>

有啦有啦！
所以：create table testtext_cc as select name,addr from testtext;(这一种方式是走mapreduce形式，这种方式是把数据也会复制过来）

4 接下来呢，看看不同文件格式读取对比
有textfile文件格式，sequencefile格式，rcfile格式，还有自定义的文件格式。

hive> create table test_text(name string,val string) stored as textfile;
OK
Time taken: 0.098 seconds
hive> desc formatted test_text;
OK
# col_name            	data_type           	comment             
	 	 
name                	string              	None                
val                 	string              	None                
	 	 
# Detailed Table Information	 	 
Database:           	default             	 
Owner:              	root                	 
CreateTime:         	Wed Jun 01 21:11:15 PDT 2016	 
LastAccessTime:     	UNKNOWN             	 
Protect Mode:       	None                	 
Retention:          	0                   	 
Location:           	hdfs://hadoop1:9000/user/hive/warehouse/test_text	 
Table Type:         	MANAGED_TABL