hive array、map、struct使用等数据类型

最新推荐文章于 2023-12-25 11:15:26 发布

weixin_34268310

最新推荐文章于 2023-12-25 11:15:26 发布

阅读量75

点赞数

文章标签：大数据 java python

原文链接：https://my.oschina.net/u/1388024/blog/295552

版权

2019独角兽企业重金招聘Python工程师标准>>>

hive array、map、struct使用

传统数据库是写时候校验，hive是读取时候校验

describe extended h5_gif; 查看表的详细信息

describe formatted h5_gif; 查看表的详细信息

普通表，分区表，外部表(建表需要:external)

set hive.mapred.mode=strict; 禁止不加分区提交

show partitions nginx_log; 查看一个表所拥有的所有分区

建表的例子
CREATE TABLE user(
name string,
info struct<name:STRING, age:INT>,
string      string
)
PARTITIONED BY(p_hour STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY ':'
LINES TERMINATED BY '\n'
STORED AS RCFILE; // textFILE

load data local inpath '/root/java/testhive/user.log' overwrite into table user partition(p_hour="02")

select * from user where p_hour="02";

./hive -S -e "select * from user where p_hour='02'"; -S 去掉 “OK”，“time tiken”等

set hive.cli.print.header=true; 打印cln

order by , sort by ,distribute by ,Cluster By

order by 会对输入做全局排序，因此只有一个reducer（多个reducer无法保证全局有序）数据大的时候，计算时间长

sort by 对于在到reduce 前排序，保证reduce 输出是有序的

distribute by 根据指定的字段，将数据进入不同的reduce

cluster by 除了具有 distribute by 的功能外还兼具 sort by 的功能。

但是排序只能是倒序排序，不能指定排序规则为asc 或者desc。

浮点数转化为整数不要用cast，而是用 round（）和 floor（）

采样一般用 rand（）和 bucket