本文简单介绍Sqoop如下操作:
Import:
rdbms导入HDFS;
rdbms导入Hive;
rdbms导入Hbase;
Export:
Hive导出到rdbms;
Hbase导出到rdbms;
其中,rdbms以mysql数据库为例。
环境准备
mysql中建库、建表并插入测试数据:
create database userdb;
use userdb;
create table emp(
id int(11) default null,
name varchar(100) default null,
deg varchar(100) default null,
salary int(11) default null,
dept varchar(10) default null,
create_time timestamp not null default current_timestamp,
update_time timestamp not null default current_timestamp on update current_timestamp,
is_delete bigint(20) default '1'
)
engine = innodb default charset=latin1;
insert into emp(id,name,deg,salary,dept) values (1201,'gopal','manager',50000,'TP'),
(1202,'manisha','Proof reader',50000,'TP'),
(1203,'khalil','php dev',30000,'AC'),
(1204,'prasanth','php dev',30000,'AC'),
(1205,'kranthi','admin',20000,'TP');
一、Import
1. rdbms导入HDFS;
–target-dir: 指定目标路径
–delete-tartget-dir:如果目标路径存在,先删除
–fields-terminated-by: 指定字段间的分隔符
1.1 全表导入
需求:导出emp表到HDFS的/sqoop/import/emp
Sqoop命令行导入:
cd /hadoop/install/sqoop-1.4.6-cdh5.14.2/
bin/sqoop import \
--connect jdbc:mysql://node03:3306/userdb \
--username root \
--password 123456 \
--table emp \
--delete-target-dir \
--target-dir /sqoop/import/emp \
--fields-terminated-by ',' \
-m 1
查看导入数据:
hdfs dfs -cat /sqoop/import/emp/part*
1201,gopal,manager,50000,TP,2020-02-21 00:15:24.0,2020-02-21 00:15:24.0,1
1202,manisha,Proof reader,50000,TP,2020-02-21 00:15:24.0,2020-02-21 00:15:24.0,1
1203,khalil,php dev,30000,AC,2020-02-21 00:15:24.0,2020-02-21 00:15:24.0,1
1204,prasanth,php dev,30000,AC,2020-02-21 00:15:24.0,2020-02-21 00:15:24.0,1
1205,kranthi,admin,20000,TP,2020-02-21 00:15:24.0,2020-02-21 00:15:24.0,1
1.2 where条件导入
Sqoop命令行导入:
cd /hadoop/install/sqoop-1.4.6-cdh5.14.2/
bin/sqoop import \
--connect jdbc:mysql://node03:3306/userdb \
--username root \
--password 123456 \
--table emp \
--where "id=1202" \
--delete-target-dir \
--target-dir /sqoop/import/emp \
--fields-terminated-by ',' \
-m 1
查看导入数据:
hdfs dfs -cat /sqoop/import/emp/part*
1202,manisha,Proof reader,50000,TP,2020-02-21 00:15:24.0,2020-02-21 00:15:24.0,1
1.3 query子查询导入
当使用query导入时需要注意以下几点:
(1) 无需–table 参数
(2)需要where条件
(3)需要$CONDITIONS,java程序用于拼接其他代码
Sqoop命令行导入:
cd /hadoop/install/sqoop-1.4.6-cdh5.14.2/
bin/sqoop import \
--connect jdbc:mysql://node03:3306/userdb \
--username root \
--password 123456 \
--query 'select * from emp where 1=1 and $CONDITIONS' \
--delete-target-dir \
--target-dir /sqoop/import/emp \
--fields-terminated-by ',' \
-m 1
查看导入数据:
hdfs dfs -cat /sqoop/import/emp/part*
1201,gopal,manager,50000,TP,2020-02-21 00:15:24.0,2020-02-21 00:15:24.0,1
1202,manisha,Proof reader,50000,TP,2020-02-21 00:15:24.0,2020-02-21 00:15:24.0,1
1203,khalil,php dev,30000,AC,2020-02-21 00:15:24.0,2020-02-21 00:15:24.0,1
1204,prasanth,php dev,30000,AC,2020-02-21 00:15:24.0,2020-02-21 00:15:24.0,1
1205,kranthi,admin,20000,TP,2020-02-21 00:15:24.0,2020-02-21 00:15:24.0,1
2. rdbms导入Hive;
拷贝Hive jar包到Sqoop 的lib目录:
cp /hadoop/install/hive-1.1.0-cdh5.14.0/lib/hive-exec-1.1.0-cdh5.14.2.jar /hadoop/install/sqoop-1.4.6-cdh5.14.2/lib/
创建Hive表:
Hive表如不创建,导入时自动创建mysql中同构表。
大家可以根据实际情况创建,此文选择不创建。
Sqoop命令行导入:
cd /hadoop/install/sqoop-1.4.6-cdh5.14.2/
bin/sqoop import \
--connect jdbc:mysql://node03:3306/userdb \
--username root \
--password 123456 \
--table emp \
--hive-import \
--hive-table test.emp_hive \
--fields-terminated-by ',' \
-m 1
查看导入数据:
select id, name, salary from test.emp_hive;
+-------+-----------+---------+--+
| id | name | salary |
+-------+-----------+---------+--+
| 1201 | gopal | 50000 |
| 1202 | manisha | 50000 |
| 1203 | khalil | 30000 |
| 1204 | prasanth | 30000 |
| 1205 | kranthi | 20000 |
+-------+-----------+---------+--+
3. rdbms导入Hbase;
修改sqoop-env.sh
#添加Hbase环境变量
#set the path to where bin/hbase is available
export HBASE_HOME=/hadoop/install/hbase-1.2.0-cdh5.14.2
Sqoop命令行导入:
cd /hadoop/install/sqoop-1.4.6-cdh5.14.2/
bin/sqoop import \
--connect jdbc:mysql://node03:3306/userdb \
--username root \
--password 123456 \
--table emp \
--columns "id,name,deg,salary,dept,create_time,update_time,is_delete" \
--column-family "info" \
--hbase-create-table \
--hbase-row-key "id" \
--hbase-table "emp_hbase" \
--split-by id \
-m 1
查看导入数据:
scan 'emp_hbase'
二、 Export
1. Hive导出到rdbms;
HDFS中 /sqoop/import/emp/part-m-00000 如下数据:
1203,khalil,php dev,30000,AC,2020-02-21 00:15:24.0,2020-02-21 00:15:24.0,1
1204,prasanth,php dev,30000,AC,2020-02-21 00:15:24.0,2020-02-21 00:15:24.0,1
1205,kranthi,admin,20000,TP,2020-02-21 00:15:24.0,2020-02-21 00:15:24.0,1
mysql创建表:
create table emp_exp(
id int(11) default null,
name varchar(100) default null,
deg varchar(100) default null,
salary int(11) default null,
dept varchar(10) default null,
create_time timestamp not null default current_timestamp,
update_time timestamp not null default current_timestamp on update current_timestamp,
is_delete bigint(20) default '1'
)
engine = innodb default charset=latin1;
Sqoop命令行导入:
cd /hadoop/install/sqoop-1.4.6-cdh5.14.2/
bin/sqoop export \
--connect jdbc:mysql://node03:3306/userdb \
--username root \
--password 123456 \
--table emp_exp \
--export-dir /sqoop/import/emp/part-m-00000 \
--input-fields-terminated-by ","
查看导入数据:
select id,name,salary from emp_exp;
+------+----------+--------+
| id | name | salary |
+------+----------+--------+
| 1205 | kranthi | 20000 |
| 1201 | gopal | 50000 |
| 1202 | manisha | 50000 |
| 1203 | khalil | 30000 |
| 1204 | prasanth | 30000 |
+------+----------+--------+
2. Hbase导出到rdbms;
因Sqoop不能识别Hfile,所以采用如下方式导出:
Hbase -> Hive外部表 -> Hive内部表 -> rdbms
创建Hive外部表:
create external table test.hbase2mysql
(id int,
name string,
deg string,
salary int,
dept string,
create_time string,
update_time string,
is_delete int
)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
with serdeproperties ("hbase.columns.mapping" = ":key,info:name,info:deg,info:salary,info:dept,info:create_time,info:update_time,info:is_delete")
tblproperties ("hbase.table.name" = "emp_hbase",
"hbase.mapred.output.outputtable" = "hbase2mysql");
创建Hive内部表,并插入数据:
create table test.hive2mysql as select * from test.hbase2mysql;
Sqoop命令行导入:
cd /hadoop/install/sqoop-1.4.6-cdh5.14.2/
bin/sqoop export \
--connect jdbc:mysql://node03:3306/userdb \
--username root \
--password 123456 \
--table emp_exp \
--export-dir /user/hive/warehouse/test.db/hive2mysql \
--input-fields-terminated-by '\001' \
--input-null-string '\\N' \
--input-null-non-string '\\N'
查看导入数据:
select id,name,salary from emp_exp;
+------+----------+--------+
| id | name | salary |
+------+----------+--------+
| 1205 | kranthi | 20000 |
| 1201 | gopal | 50000 |
| 1202 | manisha | 50000 |
| 1203 | khalil | 30000 |
| 1204 | prasanth | 30000 |
+------+----------+--------+
总结:
Sqoop具有丰富的参数,因篇幅原因,本文不能全部列举。
实际运用中,可以使用sqoop import help或者sqoop export help 查看参数。