hive 命令

最新推荐文章于 2018-08-10 00:06:22 发布

keep_moving_

最新推荐文章于 2018-08-10 00:06:22 发布

阅读量432

点赞数

分类专栏： hive

本文链接：https://blog.csdn.net/u013777684/article/details/37557909

版权

hive 专栏收录该内容

13 篇文章 0 订阅

订阅专栏

1）查看配置项

set javax.jdo.option.ConnectionURL;

2） DDL

新建表：

create table logs(ts bigint, line string)

partitioned by (dt string, country string)

row format delimited fields terminated by '\t';

插入记录：

load data local inpath '/tmp/partition.txt' into table logs partition (dt='2012-06-02', country='usa');

粗体字部分是用户在插入数据时自定义的，通过select查处的也是自定义的partition，但是实际数据还是文件中对应的数据

例子：比如这个例子partition(dt='2012-06-02')则通过select查询出的数据也是2012-06-02，但是保存在hdfs中warehouse目录下对应的数据文件，仍然是之前的日期；

两个partition在hdfs中对应数据的目录结构如下：

3) DML

LOAD DATA LOCAL INPATH './examples/files/kv1.txt' OVERWRITE INTO TABLE pokes;

如果没有local关键字，则代表文件从HDFS中取

4) query

show partitions table_name;

5） external table

create external table page_view(viewtime INT, userid bigint) row format delimited fields terminated by ',' stored as textfile location '/user/temp';

load data local inpath '/home/user/demo/external.txt' overwrite into table page_view;

external table 可以指定data存储的hdfs位置，而普通的table只能默认存储在hive-site.xml 中hive.metastore.warehouse.dir 参数指定的位置

直接在/user/temp下有数据文件，而不像普通table，会在hive.metastore.warehouse.dir 指定目录下，创建对应表的目录，然后在目录下放数据

drop 外部表，不会把数据删除，drop普通table，会把数据包括文件数据目录删除；

6） CTAS

create table abc_table_name as select *** from xxxx;

abc_table_name限制条件：不能是partition table, 不能是external table，不能是 bucket table；

7） bucket sorted table

create table bucketed_user(id int, name string) clustered by(id) sorted by(name) into 4 buckets row format delimited fields terminated by '\t' stored as textfile;

假设存在一个没有划分桶的表users

把在hive外生成的数据加载到分成桶的表中，这个划分通常是用hive来做；

需要先执行：set hive.enforce.bucketing=true；来告诉hive用表声明的桶的数量来创建桶；

insert overwrite table bucket_user select * from users;

生成结果：每个桶对应一个文件，桶对应于MapReduce的输出文件分区，一个作业产生的桶（输出的文件）和reduce任务个数相同.

对表数据进行采样： select * from bucketed_user tablesample(bucket 1 out of 4 on id);

注：tablesample是抽样语句，语法：TABLESAMPLE(BUCKET x OUT OF y)
y必须是table总bucket数的倍数或者因子。hive根据y的大小，决定抽样的比例。例如，table总共分了64份，当y=32时，抽取(64/32=)2个bucket的数据，当y=128时，抽取(64/128=)1/2个bucket的数据。x表示从哪个bucket开始抽取。例如，table总bucket数为32，tablesample(bucket 3 out of 16)，表示总共抽取（32/16=）2个bucket的数据，分别为第3个bucket和第（3+16=）19个bucket的数据。

8) external table

create external table a(id string);

会在hive-site.xml指定的hive的hdfs目录下建一个表目录，但是默认是没有数据的；

load data local inpath '/1.txt' into table a; //会把1.txt复制到hdfs指定目录下；

-------------------------------

create external table a3(id int) location '/usr/test/data/';

如果/usr/test/data/不是hive-site.xml中指定的路径，那么a3目录会在/usr/test/data/中建目录；

load data的数据也会在/usr/test/data中