Hive的DML数据操作

最新推荐文章于 2023-12-22 16:59:25 发布

情深不仅李义山

最新推荐文章于 2023-12-22 16:59:25 发布

阅读量443

点赞数

分类专栏： Hive 文章标签： hive 数据仓库

本文链接：https://blog.csdn.net/weixin_43854618/article/details/104410043

版权

Hive 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

DML是Data Manipulation Language的缩写，意思是数据库操纵语言，主要是指对数据库的增删改查操作。Hive虽然是数据仓库，但是它也有的DML。今天刚好学习了，通过此文章来巩固并记录学习过程。

1.数据导入

向表中装载数据（Load）

load data [local] inpath '/opt/module/datas/student.txt' [overwrite] into table student [partition (partcol1=val1,…)];

（1）load data:表示加载数据
（2）local:表示从本地加载数据到hive表；否则从HDFS加载数据到hive表
（3）inpath:表示加载数据的路径
（4）overwrite:表示覆盖表中已有数据，否则表示追加
（5）into table:表示加载到哪张表
（6）student:表示具体的表
（7）partition:表示上传到指定分区
案例：
首先创建一个含有id和name列的学生表：
在这里插入图片描述
由于指定了行分隔符是\t，所以hive会以\t来分割student.txt里面的数据，如果分割出来的数据大于创建表的列数，hive会把多出来的数据抛弃；如果不足创表时的列数，则会一NULL值代替。student.txt文件如下所示：
在这里插入图片描述

加载本地数据到hive

把本地的student.txt文件内容插入到student表中：

load data local inpath '/opt/module/datas/student.txt' into table default.student;

加载hdfs的数据到hive：

load data inpath '/student.txt' into table default.student;

在这里插入图片描述

加载数据覆盖表中已有的数据

load data inpath '/user/atguigu/hive/student.txt' overwrite into table default.student;

通过查询语句向表中插入数据（Insert）
创建一张分区表：

create table student(
id string,name string)
partitioned by(month string)
row format delimited fields treminated by '\t';

在这里插入图片描述

插入一些基础数据
这样的效率是非常低的，因为单词插入的数据非常有限，而每次执行这个语句都要执行一次mapreduce程序，执行一次大概要20秒得时间，所以效率很低，不常用

insert into table  student partition(month='201709') values('1','wangwu'),('2','zhaoliu');

将某张表的查询结果插入到表内
insert into：以追加数据的方式插入到表或分区，原有数据不会删除
insert overwrite：会覆盖表或分区中已存在的数据

insert overwrite table student partition(month='201708')
             select id, name from student where month='201709';

多表（多分区）插入模式（根据多张表查询结果）
将多张表或者多个分区查询结果插入到表内

from student
              insert overwrite table student partition(month='201707')
              select id, name where month='201709'
              insert overwrite table student partition(month='201706')
              select id, name where month='201709';

查询语句中创建表并加载数据（As Select）
根据查询结果创建表（查询的结果会添加到新创建的表中）

create table if not exists student3
as select id, name from student;

创建表时通过Location指定加载数据路径
创建表，并指定在hdfs上的位置：

create external table if not exists student5(
              id int, name string
              )
              row format delimited fields terminated by '\t'
              location '/student;

Import数据到指定Hive表中
/user/hive/warehouse/export/student为Hive导出的数据

import table student2 partition(month='201709') from
 '/user/hive/warehouse/export/student';

2.数据导出

Insert导出

将查询的结果导出到本地

insert overwrite local directory '/opt/module/datas/export/student'
            select * from student;

将查询的结果格式化导出到本地

insert overwrite local directory '/opt/module/datas/export/student1'
           ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'             select * from student;

将查询的结果导出到HDFS上(没有local)

insert overwrite directory '/user/atguigu/student2'
             ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' 
             select * from student;

Hadoop命令导出到本地

dfs -get /user/hive/warehouse/student/month=201709/000000_0
/opt/module/datas/export/student3.txt;

Hive Shell 命令导出

hive -e 'select * from default.student;' >
 /opt/module/datas/export/student4.txt;

Export导出到HDFS上

export table default.student to
 '/user/hive/warehouse/export/student';

3.清除表中数据（Truncate）

Truncate只能删除管理表，不能删除外部表中数据

truncate table student;

情深不仅李义山

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Hive的DML数据操作

DML是Data Manipulation Language的缩写，意思是数据库操纵语言，主要是指对数据库的增删改查操作。Hive虽然是数据仓库，但是它也有的DML。今天刚好学习了，通过此文章来巩固并记录学习过程。1.数据导入load语法：load data [local] inpath '/opt/module/datas/student.txt' [overwrite] into ...
复制链接

扫一扫

专栏目录