hive 介绍

最新推荐文章于 2024-04-25 22:17:42 发布

孤独技术

最新推荐文章于 2024-04-25 22:17:42 发布

阅读量598

点赞数

分类专栏： hive

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/wu_jin_blog/article/details/38376669

版权

hive 专栏收录该内容

0 篇文章 0 订阅

订阅专栏

一：简介

1.hive是基于hadoop文件系统之上的数据仓库架构；能更好的处理不变的大规模的数据集（如网络日志）上的批量用户；本身没有专门的数据存储格式。

2.hive中的四类数据模型：表（table），外部表（external table），分区（Partition）和桶（Bucket）.

3.创建表包括两步：创建表过程，数据加载；外部表创建只有一步：加载数据和创建表同时完成。

4.hive默认情况下配置好了Derby数据库的链接库的链接参数。

二：创建表

1 创建普通的表

create table user(userid int,name string,pawss string)

comment 'this is the user view table'

2 创建一个分区表，并用制表符来区分同一行的不同字段

create table user(userid int,name string,age int,pawss string)

comment 'this is the user view table'

partitioned by(age int)---------根据年龄分区

row format delimited

fields terminated by '\001'----用制表符分开

stored as sequencefile;--------数据需要压缩

3 添加聚类存储

将列按照userid进行分区并划分到不同的桶中，按照age值进行大小排序进行存储。这样存储允许用户通过useid属性高效的对集群列进行采样。

create table user(userid int,name string,age int,pawss string)

comment 'this is the user view table'

comment 'this is the user view table'

partitioned by(age int)

clustered by(userid) sorted by(age) into 32 buckets---按照userid进行分区并划分到不同的桶中，按照age值进行大小排序进行存储

row format delimited

fields terminated by '\t'

stored as sequencefile;

4 指定存储路径

通过Location为表指定新的存储位置

create table user(userid int,name string,age int,pawss string)

comment 'this is the user view table'

comment 'this is the user view table'

partitioned by(age int)

row format delimited

fields terminated by '\t'

stored as textfile---将数据存储为纯文本文件。

location '本地路径'

续........

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。