hive数据的导入导出

最新推荐文章于 2023-08-10 17:17:40 发布

kcy000

最新推荐文章于 2023-08-10 17:17:40 发布

阅读量785

点赞数 2

分类专栏： hive 文章标签： hive

本文链接：https://blog.csdn.net/qq_28266137/article/details/117407508

版权

hive 专栏收录该内容

14 篇文章 1 订阅

订阅专栏

数据的导入导出

1.数据的导入

1 load

语法：

load data [local] inpath '数据的 path' [overwrite] into table
table_name [partition (partcol1=val1,…)];

（1）load data:表示加载数据

（2）local:表示从本地加载数据到 hive 表；否则从 HDFS 加载数据到 hive 表

（3）inpath:表示加载数据的路径

（4）overwrite:表示覆盖表中已有数据，否则表示追加

（5）into table:表示加载到哪张表

（6）table_name:表示具体的表

（7）partition:表示上传到指定分区

加载本地的数据

load data local inpath '/opt/data/hive/hive_3_person1.txt' into table table_person1;

加载HDFS的数据

加载完数据，HDFS目录下的文件就被删除了

load data inpath '/hive/data/hive_3_person1.txt' into table table_person1;

加载分区

load data local inpath '/opt/data/hive/hive_3_dept_20210403.log' into table table_dept1_partition partition(day='20210403');

覆盖数据，一个去重的功能

load data inpath '/hive/data/hive_3_person1.txt' overwrite into table table_person1;

2 insert

语法

 insert [into][overwrite] table table_name values(v1,v2...),(v1,v2...)...;

创建一张表

create table table_student_par(id int, name string) 
row format delimited 
fields terminated by '\t';

插入数据

insert into table table_student_par values(1,'wangwu'),(2,'zhaoliu');

在这里插入图片描述

本质还是执行了一个mapreduce.

在这里插入图片描述

插入数据，覆盖重复的数据

insert overwrite table table_student_par values(1,'wangwu'),(2,'zhaoliu');

查询表的数据后再插入数据

insert overwrite table table_student_par select id, name from table_student_par ;

3.AS

同上节

4.创建的表的时候location加载数据

同上节

5.import

用import可以把用export导入的数据导入到表中

import table table_name
from '路径';

6.动态分区插入

之前使用分区都是静态的，当我们需要很多分区的时候，使用 Hive 的动态分区，会根据需要进行相应的配置。

步骤 1

开启动态分区

（1）开启动态分区功能（默认 true，开启）

hive.exec.dynamic.partition=true

（2）设置为非严格模式（动态分区的模式，默认 strict，表示必须指定至少一个分区为静态分区，nonstrict 模式表示允许所有的分区字段都可以使用动态分区。）

hive.exec.dynamic.partition.mode=nonstrict

（3）在所有执行 MR 的节点上，最大一共可以创建多少个动态分区。默认 1000

hive.exec.max.dynamic.partitions=1000

（4）在每个执行 MR 的节点上，最大可以创建多少个动态分区.该参数需要根据实际的数据来设定。比如：源数据中包含了一年的数据，即 day 字段有 365 个值，那么该参数就需要设置成大于 365，如果使用默认值 100，则会报错。

hive.exec.max.dynamic.partitions.pernode=100

（5）整个 MR Job 中，最大可以创建多少个 HDFS 文件。默认 100000

hive.exec.max.created.files=100000

（6）当有空分区生成时，是否抛出异常。一般不需要设置。默认 false

hive.error.on.empty.partition=false

例子

创建普通表

create table table_dept(
    id int,
    name string,
    loc string
)
insert into table table_dept values(1,'wangwu','wuhan'),(2,'zhaoliu','beijing'),(3,'lishi','beijing');
select * from table_dept;

创建分区表

create table table_dept_partition_dy(id int, name string) partitioned by (loc int) row format delimited fields terminated by '\t';

设置动态分区

set hive.exec.dynamic.partition.mode = nonstrict;
insert into table table_dept_partition_dy partition(loc) select id, name, loc from table_dept;

查看目标分区表的分区情况

show partitions dept_partition;

在这里插入图片描述

2.数据的导出

1.insert

用insert可以把查询的内容导出本地，HDFS。

语法

insert [overwrite] [local] directory '路径'
[row format delimited]
select 语句

overwrite 代表是否覆盖

local 代表是否是本地

row format delimited 描述按照上面进行存储

导入到本地

insert overwrite local directory '/opt/data/hive'
select *  from table_dept;

在这里插入图片描述

注意：

会被原来文件夹下的数据全部删除，再把MR的结果放在里面

在这里插入图片描述

如果要规定它怎么进行存在，

insert overwrite local directory '/opt/data/hive'
row format delimited fields terminated by '\t'
select *  from table_dept ;

在这里插入图片描述

2.hadoop直接导出

我们的表是存储在HDFS上的，可以直接把它导出

hadoop fs -get /user/hive/warehouse/db_hive.db/table_arraytext2/hive_3_arraytest1.txt hive_3_arraytest1.txt

在这里插入图片描述

导入的数据和我们导入的数据是一样的

3.Export

3.数据的清除

语法：

truncate table table_name;

**注意：**Truncate 只能删除管理表，不能删除外部表中数据

kcy000

关注

2
点赞
踩
8

收藏

觉得还不错? 一键收藏
0
评论
hive数据的导入导出

数据的导入导出1.数据的导入1 load语法：load data [local] inpath '数据的 path' [overwrite] into tabletable_name [partition (partcol1=val1,…)];（1）load data:表示加载数据（2）local:表示从本地加载数据到 hive 表；否则从 HDFS 加载数据到 hive 表（3）inpath:表示加载数据的路径（4）overwrite:表示覆盖表中已有数据，否则表示追加（5）into
复制链接

扫一扫