Hive 中一些特殊的HSQL

最新推荐文章于 2021-11-30 16:19:29 发布

xtqve

最新推荐文章于 2021-11-30 16:19:29 发布

阅读量1.6k

点赞数

分类专栏： hadoop hive

本文链接：https://blog.csdn.net/xtqve/article/details/17683379

版权

hadoop 同时被 2 个专栏收录

15 篇文章 0 订阅

订阅专栏

hive

4 篇文章 0 订阅

订阅专栏

1、导入数据时，指定分隔符

CREATE TABLE new_table_name
  row format delimited fields terminated by '\t'
  stored by textfile
as
 select id,name from table_name
;

2、当创建了一个bucket的表时，导入数据的方式：

set hive.enforce.bucketing = true;//这个让程序自动分配reduce的数量从而适配相关的bucket

insert overwrite table ext_login_bucket partition(dt='2013-12-01')
select uid,ips from ext_login where dt='2013-12-27';

注意上面的语句中，一定需要指定 dt='xxx'条件，不然的话，会出现一个空指针的异常：(message:partition values=[2013-12-01])

还有一点，插入语句的overwrite并不是可选的，是必填的，我经常忘记

同时，我们知道，加载数据还有一种方式，叫 load data ，这种方式也能导入成功，但是它不会按你设置的bucket个数将原文件拆分，这点需要了解一下。

3、查看bucket的 tablesample数据

select * from ext_login tablesample(bucket 1 out of 2 on id);

tablesample是抽样语句，语法：TABLESAMPLE(BUCKET x OUT OF y)；其中y必须是table总bucket数的倍数或者因子。hive根据y的大小，决定抽样的比例。

比如：table总共分了64份，当Y=32时，抽取(64/32=)2个bucket的数据；当y=128时，抽取(64/128=)1/2个bucket数据。x表示从哪个bucket开始抽取。例如，table总bucket数为32，tablesample(bucket 3 out of 16)，表示总共抽取 (32/16=)2个bucket的数据，分别为第3个和第（3+16=）19个bucket的数据

4、未完。。

xtqve

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Hive 中一些特殊的HSQL

1、导入数据时，指定分隔符 CREATE TABLE new_table_name row format delimited fields terminated by '\t' stored by textfileas select id,name from table_name;
复制链接

扫一扫

专栏目录