【Sqoop】sqoop导入导出

本文简单介绍Sqoop如下操作:

Import:

​ rdbms导入HDFS;

​ rdbms导入Hive;

​ rdbms导入Hbase;

Export:

​ Hive导出到rdbms;

​ Hbase导出到rdbms;

其中,rdbms以mysql数据库为例。

环境准备

mysql中建库、建表并插入测试数据:

create database userdb;
use userdb;

create table emp(
id int(11) default null,
name varchar(100) default null,
deg varchar(100) default null,
salary int(11) default null,
dept varchar(10) default null,
create_time timestamp not null default current_timestamp,
update_time timestamp not null default current_timestamp on update current_timestamp,
is_delete bigint(20) default '1'
)
engine = innodb default charset=latin1;

insert into emp(id,name,deg,salary,dept) values (1201,'gopal','manager',50000,'TP'),
(1202,'manisha','Proof reader',50000,'TP'),
(1203,'khalil','php dev',30000,'AC'),
(1204,'prasanth','php dev',30000,'AC'),
(1205,'kranthi','admin',20000,'TP');

一、Import

1. rdbms导入HDFS;

–target-dir: 指定目标路径

–delete-tartget-dir:如果目标路径存在,先删除

–fields-terminated-by: 指定字段间的分隔符

1.1 全表导入

需求:导出emp表到HDFS的/sqoop/import/emp

Sqoop命令行导入:

cd /hadoop/install/sqoop-1.4.6-cdh5.14.2/

bin/sqoop import \
--connect jdbc:mysql://node03:3306/userdb \
--username root \
--password 123456 \
--table emp \
--delete-target-dir \
--target-dir /sqoop/import/emp \
--fields-terminated-by ',' \
-m 1

查看导入数据:

hdfs dfs -cat /sqoop/import/emp/part*

1201,gopal,manager,50000,TP,2020-02-21 00:15:24.0,2020-02-21 00:15:24.0,1
1202,manisha,Proof reader,50000,TP,2020-02-21 00:15:24.0,2020-02-21 00:15:24.0,1
1203,khalil,php dev,30000,AC,2020-02-21 00:15:24.0,2020-02-21 00:15:24.0,1
1204,prasanth,php dev,30000,AC,2020-02-21 00:15:24.0,2020-02-21 00:15:24.0,1
1205,kranthi,admin,20000,TP,2020-02-21 00:15:24.0,2020-02-21 00:15:24.0,1

1.2 where条件导入

Sqoop命令行导入:

cd /hadoop/install/sqoop-1.4.6-cdh5.14.2/

bin/sqoop import \
--connect jdbc:mysql://node03:3306/userdb \
--username root \
--password 123456 \
--table emp \
--where "id=1202" \
--delete-target-dir \
--target-dir /sqoop/import/emp \
--fields-terminated-by ',' \
-m 1

查看导入数据:

hdfs dfs -cat /sqoop/import/emp/part*

1202,manisha,Proof reader,50000,TP,2020-02-21 00:15:24.0,2020-02-21 00:15:24.0,1

1.3 query子查询导入

当使用query导入时需要注意以下几点:

(1) 无需–table 参数

(2)需要where条件

(3)需要$CONDITIONS,java程序用于拼接其他代码

Sqoop命令行导入:

cd /hadoop/install/sqoop-1.4.6-cdh5.14.2/

bin/sqoop import \
--connect jdbc:mysql://node03:3306/userdb \
--username root \
--password 123456 \
--query 'select * from emp where 1=1 and $CONDITIONS' \
--delete-target-dir \
--target-dir /sqoop/import/emp \
--fields-terminated-by ',' \
-m 1

查看导入数据:

hdfs dfs -cat /sqoop/import/emp/part*

1201,gopal,manager,50000,TP,2020-02-21 00:15:24.0,2020-02-21 00:15:24.0,1
1202,manisha,Proof reader,50000,TP,2020-02-21 00:15:24.0,2020-02-21 00:15:24.0,1
1203,khalil,php dev,30000,AC,2020-02-21 00:15:24.0,2020-02-21 00:15:24.0,1
1204,prasanth,php dev,30000,AC,2020-02-21 00:15:24.0,2020-02-21 00:15:24.0,1
1205,kranthi,admin,20000,TP,2020-02-21 00:15:24.0,2020-02-21 00:15:24.0,1

2. rdbms导入Hive;

拷贝Hive jar包到Sqoop 的lib目录:

cp /hadoop/install/hive-1.1.0-cdh5.14.0/lib/hive-exec-1.1.0-cdh5.14.2.jar /hadoop/install/sqoop-1.4.6-cdh5.14.2/lib/

创建Hive表:

Hive表如不创建,导入时自动创建mysql中同构表。

大家可以根据实际情况创建,此文选择不创建。

Sqoop命令行导入:

cd /hadoop/install/sqoop-1.4.6-cdh5.14.2/

bin/sqoop import \
--connect jdbc:mysql://node03:3306/userdb \
--username root \
--password 123456 \
--table emp \
--hive-import \
--hive-table test.emp_hive \
--fields-terminated-by ',' \
-m 1

查看导入数据:

select id, name, salary from test.emp_hive;

+-------+-----------+---------+--+
|  id   |   name    | salary  |
+-------+-----------+---------+--+
| 1201  | gopal     | 50000   |
| 1202  | manisha   | 50000   |
| 1203  | khalil    | 30000   |
| 1204  | prasanth  | 30000   |
| 1205  | kranthi   | 20000   |
+-------+-----------+---------+--+

3. rdbms导入Hbase;

修改sqoop-env.sh

#添加Hbase环境变量

#set the path to where bin/hbase is available
export HBASE_HOME=/hadoop/install/hbase-1.2.0-cdh5.14.2

Sqoop命令行导入:

cd /hadoop/install/sqoop-1.4.6-cdh5.14.2/

bin/sqoop import \
--connect jdbc:mysql://node03:3306/userdb \
--username root \
--password 123456 \
--table emp \
--columns "id,name,deg,salary,dept,create_time,update_time,is_delete" \
--column-family "info" \
--hbase-create-table \
--hbase-row-key "id" \
--hbase-table "emp_hbase" \
--split-by id \
-m 1

查看导入数据:

scan 'emp_hbase'

二、 Export

1. Hive导出到rdbms;

HDFS中 /sqoop/import/emp/part-m-00000 如下数据:

1203,khalil,php dev,30000,AC,2020-02-21 00:15:24.0,2020-02-21 00:15:24.0,1
1204,prasanth,php dev,30000,AC,2020-02-21 00:15:24.0,2020-02-21 00:15:24.0,1
1205,kranthi,admin,20000,TP,2020-02-21 00:15:24.0,2020-02-21 00:15:24.0,1

mysql创建表:

create table emp_exp(
id int(11) default null,
name varchar(100) default null,
deg varchar(100) default null,
salary int(11) default null,
dept varchar(10) default null,
create_time timestamp not null default current_timestamp,
update_time timestamp not null default current_timestamp on update current_timestamp,
is_delete bigint(20) default '1'
)
engine = innodb default charset=latin1;

Sqoop命令行导入:

cd /hadoop/install/sqoop-1.4.6-cdh5.14.2/

bin/sqoop export \
--connect jdbc:mysql://node03:3306/userdb \
--username root \
--password 123456 \
--table emp_exp \
--export-dir /sqoop/import/emp/part-m-00000 \
--input-fields-terminated-by ","

查看导入数据:

select id,name,salary from emp_exp;
+------+----------+--------+
| id   | name     | salary |
+------+----------+--------+
| 1205 | kranthi  |  20000 |
| 1201 | gopal    |  50000 |
| 1202 | manisha  |  50000 |
| 1203 | khalil   |  30000 |
| 1204 | prasanth |  30000 |
+------+----------+--------+

2. Hbase导出到rdbms;

因Sqoop不能识别Hfile,所以采用如下方式导出:

Hbase -> Hive外部表 -> Hive内部表 -> rdbms

创建Hive外部表:

create external table test.hbase2mysql
(id int,
 name string,
 deg string,
 salary int,
 dept string,
 create_time string,
 update_time string,
 is_delete int
 )
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
with serdeproperties ("hbase.columns.mapping" = ":key,info:name,info:deg,info:salary,info:dept,info:create_time,info:update_time,info:is_delete")
tblproperties ("hbase.table.name" = "emp_hbase",
"hbase.mapred.output.outputtable" = "hbase2mysql");

创建Hive内部表,并插入数据:

create table test.hive2mysql as select * from test.hbase2mysql;

Sqoop命令行导入:

cd /hadoop/install/sqoop-1.4.6-cdh5.14.2/

bin/sqoop export \
--connect jdbc:mysql://node03:3306/userdb \
--username root \
--password 123456 \
--table emp_exp \
--export-dir /user/hive/warehouse/test.db/hive2mysql \
--input-fields-terminated-by '\001' \
--input-null-string '\\N' \
--input-null-non-string '\\N'

查看导入数据:

select id,name,salary from emp_exp;
+------+----------+--------+
| id   | name     | salary |
+------+----------+--------+
| 1205 | kranthi  |  20000 |
| 1201 | gopal    |  50000 |
| 1202 | manisha  |  50000 |
| 1203 | khalil   |  30000 |
| 1204 | prasanth |  30000 |
+------+----------+--------+

总结:

Sqoop具有丰富的参数,因篇幅原因,本文不能全部列举。
实际运用中,可以使用sqoop import help或者sqoop export help 查看参数。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值