hive的基本操作

最新推荐文章于 2022-11-08 08:34:59 发布

weixin_42333583

最新推荐文章于 2022-11-08 08:34:59 发布

阅读量182

点赞数

分类专栏： hadoop hive

本文链接：https://blog.csdn.net/weixin_42333583/article/details/83218339

版权

hadoop 同时被 2 个专栏收录

7 篇文章 0 订阅

订阅专栏

hive

3 篇文章 0 订阅

订阅专栏

Hive的基本操作

数据库的操作

1 创建数据库操作create database if not exits myhive;
默认将数据库和数据表放在hdfs的/user/hive/warehouse 目录下

2 创建数据库或表，自定义在hdfs存放位置
create database if not exists myhive2 location "/user/mydatabase/";

3 查看数据库详情
desc database myhive2;

数据表的操作

建表的时候要确定hive存储数据的分隔符，默认状态下是以\001来分割，我们需要自定义为 \t

创建四种类型数据库表
1 管理表：内部表
创建表指定字段分隔符和表存储的位置
create table if not exists stu2(id int,name string) row format delimited fields terminated by "\t" stored as textfile location '/user/stu2';

根据查询结果创建表
create table if not exists stu4 as select * from stu3; //as 复制表结构和数据
create table if not exists stu5 like stu4; //like只复制表结果，不复制数据

查询数据表结构：
desc formatted 表名称；

2 外部表
create external table teacher(t_id string,t_name string) row format delimited fields terminated by "\t";

数据从本地系统导入hive
load data local inpath '/export/servers/hive-study-data/student.csv' into table student;

将数据从hdfs导入到hive中
load data inpath '/user/hive_data/techer.csv' into table teacher;

3 分区表
创建分区表
create table score(s_id string ,c_id string,s_score int) partitioned by (month string) row format delimited fields terminated by '\t';

create table score2(s_id string,c_id string, s_score int) partitioned by (year string ,month string,day string) row format delimited fields terminated by '\t';

加载数据到分区表
load data local inpath '/export/servers/hive-study-data/score.csv' into table score partition(month='201810');

load data local inpath '/export/servers/hive-study-data/score.csv' into table score2 partition(year='2018',month='10',day='15');

多分区联合查询使用union all来实现
select * from score2 where month='10' union all select * from score2 where month='10';
查看分区：
show partitions score;

添加一个分区：
alter table 表名 add partition(month='11'); //针对只有一层分区

注意;添加分区之后就可以在hdfs文件夹下看到新建了一个分区文件夹；

日志数据与表数据关联
create external table score4(s_id string,c_id string,s_score int) partitioned by(month string) row format delimited fields terminated by '\t' location '/scoredatas/';

修复关联表，将数据存储在hive数据库中
msck repair table score4;

4 分桶表
开启分桶
set hive.enforce.bucketing=true;
设置分桶的数量
set mapreduce.job.reduces=3;

新建分桶表
create table course(c_id string,c_name string,t_id string) clustered by(c_id) into 3 buckets row format delimited fields terminated by '\t';

分桶表不能直接加入数据，只能将一个表导入分桶表（insert overwrite table XXX）
insert overwrite table course select * from course_common cluster by(c_id);

修改表
表重命名：
alter table score4 rename to score5;
增加列：
alter table score5 add columns (mycol string, mysco string);
修改列：
alter table score5 change column mysco mysconew int;

weixin_42333583

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
hive的基本操作

Hive的基本操作数据库的操作1 创建数据库操作create database if not exits myhive;默认将数据库和数据表放在hdfs的/user/hive/warehouse 目录下2 创建数据库或表，自定义在hdfs存放位置create database if not exi...
复制链接

扫一扫

专栏目录