1. 前言
除了前面介绍的导入数据的方式之外,Sqoop也支持导入SQL查询的结果集。
–query参数指定SQL语句
–target-dir参数指定目标文件
如果想要并行地导入查询结果,每个Map需要执行一个查询副本,查询必须要有一个$CONDITIONS符号,表示每个Sqoop进程被唯一的条件语句替换,同时必须要用–split-by参数选择一个列来进行划分。
举个栗子:
sqoop import --query 'select a.*, b.* from a join b on (a.id == b.id)
where $CONDITIONS' --split-by a.id --target-dir /xxx/xxx/xxx
或者
sqoop import --query 'select a.*, b.* from a join b on (a.id == b.id)
where $CONDITIONS' -m 1 --target-dir /xxx/xxx/xxx
2. 配置文件
import
--connect
jdbc:mysql://master001:3306/test
--username
hadoop1
--password
hadoop
--target-dir
hdfs://mycluster/sqoop1
--delete-target-dir
-m
1
--as-textfile
--null-string
''
--query
"select p.id, p.name, p.age, c.id as cid, date_format(c.create_date,'%Y-%m-%d') from t_user p join t_id_card c on p.id=c.id where p.name is not null and $CONDITIONS"
3. 创建相关数据
首先进入test数据库,创建t_user表:
create table t_user(id int,name varchar(20),age int);
insert into t_user values(1,'lhd',22);
insert into t_user values(2,'ls',22);
insert into t_user values(3,'cjb',22);
insert into t_user values(4,'jl',23);
insert into t_user values(5,'gwh',23);
insert into t_user values(6,'gdh',23);
创建t_id_card表:
create table t_id_card(id int,p_id int,create_date date);
insert into t_id_card values(001,1,'2019-09-27');
insert into t_id_card values(002,2,'2019-09-28');
insert into t_id_card values(003,3,'2019-09-29');
insert into t_id_card values(004,4,'2019-09-30');
insert into t_id_card values(005,5,'2019-10-01');
insert into t_id_card values(006,6,'2019-10-02');
4. 执行Sqoop
sqoop --options-file conf3
5. 查看结果
hadoop fs -ls /sqoop1/
hadoop fs -cat /sqoop1/part-m-00000