hive-create table

场景:因业务特殊需求,临时需要创建一张表!!!

(1)表存储格式是textfile(文本格式)

建表语句:

View Code
查看表结构:

CREATE TABLE test_1(
task_id int,
task_name string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘|’
LINES TERMINATED BY ‘\n’
STORED AS INPUTFORMAT
‘org.apache.hadoop.mapred.TextInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat’
LOCATION
‘hdfs://ns1012/user/mart_fro/tmp.db/test_1’;

(2)表存储格式是lzo

建表语句:

View Code
查看表结构:

CREATE EXTERNAL TABLE test_2(
task_id int,
task_name string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘|’
LINES TERMINATED BY ‘\n’
STORED AS INPUTFORMAT
‘com.hadoop.mapred.DeprecatedLzoTextInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat’
LOCATION
‘hdfs://ns1012/user/mart_fro/tmp.db/test_2’;

(3)表存储格式是orc

orc格式查看参考博客:https://www.cnblogs.com/lasclocker/p/5685941.html

建表语句:

View Code
查看表结构:

CREATE EXTERNAL TABLE test_3(
task_id int,
task_name string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘|’
LINES TERMINATED BY ‘\n’
STORED AS INPUTFORMAT
‘org.apache.hadoop.hive.ql.io.orc.OrcInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat’
LOCATION
‘hdfs://ns1012/user/mart_fro/tmp.db/test_3’;

汇总:

text:
STORED AS INPUTFORMAT
‘org.apache.hadoop.mapred.TextInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat’

lzo:
STORED AS INPUTFORMAT
‘com.hadoop.mapred.DeprecatedLzoTextInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat’

orc:
STORED AS INPUTFORMAT
‘org.apache.hadoop.hive.ql.io.orc.OrcInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat’

(4)创建临时表

第一:创建表并携带数据,只能创建内部表(不能创建外部表)–表结构不完全一致,除非自己指定

create table task_info_test as
select *
from task_info;
执行计划:

View Code

具体执行日志:

hive> create table task_info_test as
> select *
> from task_info;
Query ID = mart_fro_20191002210314_9c25e306-9186-455e-a462-1d0cb08746e1
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there’s no reduce operator
Start submit job !
Start GetSplits
GetSplits finish, it costs : 43 milliseconds
Submit job success : job_1533628320510_33970740
Starting Job = job_1533628320510_33970740, Tracking URL = http://BJHTYD-Hope-25-11.hadoop.jd.local:50320/proxy/application_1533628320510_33970740/
Kill Command = /data0/hadoop/hadoop_2.100.31_2019090518/bin/hadoop job -kill job_1533628320510_33970740
Hadoop job(job_1533628320510_33970740) information for Stage-1: number of mappers: 1; number of reducers: 0
2019-10-02 21:03:27,012 Stage-1(job_1533628320510_33970740) map = 0%, reduce = 0%
2019-10-02 21:03:44,581 Stage-1(job_1533628320510_33970740) map = 100%, reduce = 0%, Cumulative CPU 2.32 sec
MapReduce Total cumulative CPU time: 2 seconds 320 msec
Stage-1 Elapsed : 27094 ms job_1533628320510_33970740
Ended Job = job_1533628320510_33970740
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://ns1012/tmp/mart_fro/mart_fro/hive/hive_hive_2019-10-02_21-03-14_179_6935514001098968121-1/-ext-10001
Moving data to: hdfs://ns1012/user/mart_fro/tmp.db/task_info_test
CounterStats: 获取Counter信息用时: 3319 ms
Table tmp.task_info_test stats: [numFiles=1, numRows=6, totalSize=411, rawDataSize=405]
MapReduce Jobs Launched:
Stage-1: job_1533628320510_33970740 SUCCESS HDFS Read: 0.000 GB HDFS Write: 0.000 GB Elapsed : 27s94ms
Map: Total: 1 Success: 1 Killed: 0 Failed: 0 avgMapTime: 15s626ms
Reduce: Total: 0 Success: 0 Killed: 0 Failed: 0 avgReduceTime: 0ms avgShuffleTime: 0ms avgMergeTime: 0ms
JobHistory URL : http://BJHTYD-Hope-17-72.hadoop.jd.local:19888/jobhistory/job/job_1533628320510_33970740

Total MapReduce CPU Time Spent: 2s320ms
Total Map: 1 Total Reduce: 0
Total HDFS Read: 0.000 GB Written: 0.000 GB
OK
Time taken: 35.234 seconds
最后我们查看一下创建表的表结构:(可以看出,表的分隔符等不一致)

CREATE TABLE task_info_test(
task_id int,
task_name string,
task_parents array,
task_tags map<int,string>)
ROW FORMAT SERDE
‘org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe’
STORED AS INPUTFORMAT
‘org.apache.hadoop.mapred.TextInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat’
LOCATION
‘hdfs://ns1012/user/mart_fro/tmp.db/task_info_test’
TBLPROPERTIES (
‘COLUMN_STATS_ACCURATE’=‘true’,
‘mart_name’=‘mart_fro’,
‘numFiles’=‘1’,
‘numRows’=‘6’,
‘rawDataSize’=‘405’,
‘totalSize’=‘411’,
‘transient_lastDdlTime’=‘1570021429’)
Time taken: 0.034 seconds, Fetched: 21 row(s)
根据执行计划,我们总结一下创建这个临时表的过程:

(1)查找数据,放在临时目录下面:

hdfs://ns1012/tmp/mart_fro/mart_fro/hive/hive_hive_2019-10-02_21-03-14_179_6935514001098968121-1/-ext-10001

(2)mv数据,放到表对应的目录下面:

hdfs://ns1012/user/mart_fro/tmp.db/task_info_test

(3)创建临时表(hdfs路径正好已经创建好了,所以可以直接使用)

注意:创建表携带数据,如果你创建的是外部表,会报错:

explain
create external table task_info_test as
select *
from task_info;
FAILED: SemanticException [Error 10070]: CREATE-TABLE-AS-SELECT cannot create external table

第二:创建表不携带数据,可以创建内部表,也可以创建外部表–表结构一致

sql1:

create table task_info_test_2 like task_info;
执行计划:

hive> explain create table task_info_test_2 like task_info;
OK
STAGE DEPENDENCIES:
Stage-0 is a root stage

STAGE PLANS:
Stage: Stage-0
Create Table Operator:
Create Table
default input format: org.apache.hadoop.mapred.TextInputFormat
default output format: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat
default serde name: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
like: task_info
name: tmp.task_info_test_2
查看目标表结构:(从目标表结构可以看出,除了location不一样之外,其余都一样)

hive> show create table task_info_test_2;
OK
CREATE TABLE task_info_test_2(
task_id int COMMENT ‘任务id’,
task_name string COMMENT ‘任务名称’,
task_parents array COMMENT ‘父任务id’,
task_tags map<int,string> COMMENT ‘任务关联的标签信息’)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘|’
COLLECTION ITEMS TERMINATED BY ‘,’
MAP KEYS TERMINATED BY ‘:’
LINES TERMINATED BY ‘\n’
STORED AS INPUTFORMAT
‘org.apache.hadoop.mapred.TextInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat’
LOCATION
‘hdfs://ns1012/user/mart_fro/tmp.db/task_info_test_2’;

sql2:

create external table task_info_test_1 like task_info;
执行计划:

STAGE DEPENDENCIES:
Stage-0 is a root stage

STAGE PLANS:
Stage: Stage-0
Create Table Operator:
Create Table
default input format: org.apache.hadoop.mapred.TextInputFormat
default output format: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat
default serde name: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
like: task_info
name: tmp.task_info_test_1
isExternal: true
查看目标表结构:(从目标表结构可以看出,除了location不一样之外,其余都一样)

hive> show create table task_info_test_1;
OK
CREATE EXTERNAL TABLE task_info_test_1(
task_id int COMMENT ‘任务id’,
task_name string COMMENT ‘任务名称’,
task_parents array COMMENT ‘父任务id’,
task_tags map<int,string> COMMENT ‘任务关联的标签信息’)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘|’
COLLECTION ITEMS TERMINATED BY ‘,’
MAP KEYS TERMINATED BY ‘:’
LINES TERMINATED BY ‘\n’
STORED AS INPUTFORMAT
‘org.apache.hadoop.mapred.TextInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat’
LOCATION
‘hdfs://ns1012/user/mart_fro/tmp.db/task_info_test_1’
TBLPROPERTIES (
‘mart_name’=‘mart_fro’,
‘transient_lastDdlTime’=‘1570023418’)

上面介绍了那么多,现实中是这样用的:

(1)先通过create table like方式创建一张表结构一样的临时表(可以是内部表,也可以是外部表)

(2)在通过insert overwrite方式向临时表当中导入数据

insert overwrite table task_info_test
select *
from task_info;
最终的数据都会放在hdfs://…//临时表名字/这个路径下面!!!!

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

一只懒得睁眼的猫

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值