DML数据操作_dml实现把某个数据表中的值覆盖-CSDN博客

本文链接：https://blog.csdn.net/qq_61645895/article/details/122644553

1、 数据导入

1. 向表中装载数据（Load）

1）．语法

hive> load data [local] inpath '/export/servers/datas/student.txt' [overwrite] into table student [partition (partcol1=val1,…)];

（1）load data:表示加载数据

（2）local:表示从本地加载数据到hive表；否则从HDFS加载数据到hive表

（3）inpath:表示加载数据的路径

（4）overwrite:表示覆盖表中已有数据，否则表示追加

（5）into table:表示加载到哪张表

（6）student:表示具体的表

（7）partition:表示上传到指定分区

2）．实操案例

（1）创建一张表

hive (default)> create table student(id string, name string) row format delimited fields terminated by '\t';

（2）加载本地文件到hive

hive (default)> load data local inpath '/export/servers/datas/student.txt' into table student;

通过 select * from student确保数据已经上传。4

注意：load data后本地的student.txt文件还在。

（3）加载HDFS文件到hive中

上传文件到HDFS

hive (default)> dfs -mkdir -p /user/root/hive;

hive (default)> dfs -put /export/servers/datas/student.txt /user/root/hive;

（4）加载HDFS上数据（没有local）

hive (default)> load data inpath '/user/root/hive/student.txt' into table student;

注意：load data后hdfs上的student.txt文件将自动被删除。

所以从本地load data是上传拷贝到hive中，从hdfs上load data是移动到hive中。

（5）加载数据覆盖表中已有的数据

上传文件到HDFS

hive (default)> dfs -mkdir -p /user/root/hive;

hive (default)> dfs -put /export/servers/datas/student.txt /user/root/hive;

加载数据覆盖表中已有的数据

hive (default)> load data inpath '/user/root/hive/student.txt' overwrite into table student;

注意如果不写overwrite，则表示追加。

2. 通过查询语句向表中插入数据（Insert）

（1）．创建一张分区表（如果已有表table，需要先drop删除）

hive (default)> create table student(id int, name string) partitioned by (month string) row format delimited fields terminated by '\t';

（2）．基本插入数据

hive (default)> insert into table student partition(month='202109') values(1,'wangwu'),(2, 'zhaoliu');

这里需要进行mapreduce操作，等待时间较长。

注意一点，像这样的普通MySQL的语句也能执行。比如：

hive (default)> insert into student values(3,'maqi','202109'),(4, 'niuba','202109');

但实际上这种操作并不常见。大数据如果这样一条条添加不知道要等到猴年马月了。

最其他是通过其他表的选取结果，查找并插入到新表中。

（3）．基本模式插入（根据单张表查询结果）

hive (default)> insert overwrite table student partition(month='202108')

select id, name from student where month='202109';

这里需要进行mapreduce操作。

insert into：以追加数据的方式插入到表或分区，原有数据不会删除

insert overwrite：会覆盖表或分区中已存在的数据

注意：insert不支持插入部分字段

这是将来insert最常用的方式。把旧表的数据导入到新表中。

（4）．多表（多分区）插入模式（根据多张表查询结果）

hive (default)> from student

insert overwrite table student partition(month='202107')

select id, name where month='202109'

insert overwrite table student partition(month='202106')

select id, name where month='202109';

注意这里可以把from写到前面。比如：

hive (default)> from student select *;

其实这样还好写一点（从表里面选），只是不习惯。

3. 查询语句中创建表并加载数据（As Select）

详见7.4.5创建表。

根据查询结果创建表（查询的结果会添加到新创建的表中）

hive (default)> create table if not exists student3

as select id, name from student;

4. 创建表时通过location指定加载数据路径

（1）．上传数据到hdfs上

hive (default)> dfs -mkdir /student;

hive (default)> dfs -put /export/servers/datas/student.txt /student;

如果student.txt文件已存在，需要先删除。

（2）. 创建表，并指定在hdfs上的位置

hive (default)> create external table if not exists student5(

id int, name string

)

row format delimited fields terminated by '\t'

location '/student';

（3）．查询数据

hive (default)> select * from student5;

5. Import数据到指定Hive表中

注意：先用export导出后，再将数据导入。（所以先完成导出的操作）

导入全部数据。

hive (default)> import table student8 from '/user/hive/warehouse/db_hive.db/export/student';

导入的时候使用分区。

hive (default)> import table student9 partition(month='202109') from '/user/hive/warehouse/db_hive.db/export/student';

先完成导出的操作再来导入。

2、数据导出

1. Insert导出

（1）．将查询的结果导出到本地

hive (default)> insert overwrite local directory '/export/servers/datas/export/student'

select * from student;

这里需要mapreduce操作。

注意/export/servers/datas中的export/student目录是导出时自动生成的。

该目录下生成的这些是mapreduce输出文件。

（2）．将查询的结果格式化导出到本地。这里指定了分隔符为'\t'。

hive(default)>insert overwrite local directory '/export/servers/datas/export/student1'

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

select * from student;

运行程序会在/export/servers/datas目录下生成export/student1子目录。

（3）．将查询的结果导出到HDFS上(没有local)

hive (default)> insert overwrite directory '/user/root/student2'

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

select * from student;

2. Hadoop命令导出到本地

hive (default)> dfs -get /user/root/student2/000000_0 /export/servers/datas/export/student3.txt;

注意：如果student3.txt已经存在，会报错。

另外该命令中间不能换行。

3. Hive Shell 命令导出

基本语法：（hive -f/-e 执行语句或者脚本 > file）

[root@hadoop101 hive]# bin/hive -e 'select * from db_hive.student;' > /export/servers/datas/export/student4.txt;

4. Export导出到HDFS上

hive (default)> export table db_hive.student to '/user/hive/warehouse/db_hive.db/export/student';

export和import主要用于两个Hadoop平台集群之间Hive表迁移。

这些都是表中的数据。是把表的元素搬动到hdfs上，另外还有metadata元数据。

在import时就会发现问题：不需要把表的描述写出来，因为表的描述数据在metadata中都有的。

hive (default)> import table student8 from '/user/hive/warehouse/db_hive.db/export/student';

注意不用预先create table student8。

这里的stuent8没有添加任何描述数据，但是能正确的把表结构描述处理。同时hdfs上也能正确生成数据。

导入的时候使用分区。

hive (default)> import table student9 partition(month='202109') from '/user/hive/warehouse/db_hive.db/export/student';

5. Sqoop导出

后续课程专门讲。这个是关于数据库和hdfs互相导来导去的。

3、清除表中数据（Truncate）

注意：Truncate只能删除管理表，不能删除外部表中数据

hive (default)> truncate table student;