Sqoop向HDFS中导入查询结果（六）

最新推荐文章于 2022-02-11 14:37:18 发布

RivenDong

最新推荐文章于 2022-02-11 14:37:18 发布

阅读量1.1k

点赞数 1

分类专栏： # Sqoop # HDFS 大数据

本文链接：https://blog.csdn.net/RivenDong/article/details/101516539

版权

大数据同时被 3 个专栏收录

91 篇文章 6 订阅

订阅专栏

HDFS

14 篇文章 1 订阅

订阅专栏

Sqoop

7 篇文章 0 订阅

订阅专栏

文章目录

1. 前言
2. 配置文件
3. 创建相关数据
4. 执行Sqoop
5. 查看结果

1. 前言

除了前面介绍的导入数据的方式之外，Sqoop也支持导入SQL查询的结果集。

–query参数指定SQL语句
–target-dir参数指定目标文件

如果想要并行地导入查询结果，每个Map需要执行一个查询副本，查询必须要有一个$CONDITIONS符号，表示每个Sqoop进程被唯一的条件语句替换，同时必须要用–split-by参数选择一个列来进行划分。

举个栗子：

sqoop import --query 'select a.*, b.* from a join b on (a.id == b.id)  
where $CONDITIONS' --split-by a.id --target-dir /xxx/xxx/xxx

或者

sqoop import --query 'select a.*, b.* from a join b on (a.id == b.id) 
where $CONDITIONS' -m 1 --target-dir /xxx/xxx/xxx

2. 配置文件

import
--connect
jdbc:mysql://master001:3306/test
--username
hadoop1
--password
hadoop
--target-dir
hdfs://mycluster/sqoop1
--delete-target-dir
-m
1
--as-textfile
--null-string
''
--query
"select p.id, p.name, p.age, c.id as cid, date_format(c.create_date,'%Y-%m-%d') from t_user p join t_id_card c on p.id=c.id where p.name is not null and $CONDITIONS"

3. 创建相关数据

首先进入test数据库，创建t_user表:

create table t_user(id int,name varchar(20),age int);
insert into t_user values(1,'lhd',22);
insert into t_user values(2,'ls',22);
insert into t_user values(3,'cjb',22);
insert into t_user values(4,'jl',23);
insert into t_user values(5,'gwh',23);
insert into t_user values(6,'gdh',23);

在这里插入图片描述
创建t_id_card表：

create table t_id_card(id int,p_id int,create_date date);
insert into t_id_card values(001,1,'2019-09-27');
insert into t_id_card values(002,2,'2019-09-28');
insert into t_id_card values(003,3,'2019-09-29');
insert into t_id_card values(004,4,'2019-09-30');
insert into t_id_card values(005,5,'2019-10-01');
insert into t_id_card values(006,6,'2019-10-02');

在这里插入图片描述

4. 执行Sqoop

sqoop --options-file conf3

在这里插入图片描述

5. 查看结果

hadoop fs -ls /sqoop1/
hadoop fs -cat /sqoop1/part-m-00000

在这里插入图片描述

RivenDong

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Sqoop向HDFS中导入查询结果（六）

文章目录1. 前言2. 配置文件3. 创建相关数据4. 执行Sqoop5. 查看结果1. 前言除了前面介绍的导入数据的方式之外，Sqoop也支持导入SQL查询的结果集。–query参数指定SQL语句–target-dir参数指定目标文件如果想要并行地导入查询结果，每个Map需要执行一个查询副本，查询必须要有一个$CONDITIONS符号，表示每个Sqoop进程被唯一的条件语句替换，...
复制链接

扫一扫

专栏目录