Sqoop

简介

Sqoop是一款开源工具,主要用于Hadoop(Hive)与传统数据库(mysql、postgresql…)间进行数据沟通,可以将一个关系型数据库(Oracle、Mysql、Postgres)中数据导入Hadoop中,也可将HDFS中数据导入到关系型数据库中。

原理

将导入或导出命令翻译为mapreduce程序来实现(主要是对inputformat和outputformat进行定制)

安装

  1. 下载安装包解压到linux
tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz -C /opt/module/
  1. 修改配置文件
[yyx@hadoop01 conf]$ mv sqoop-env-template.sh sqoop-env.sh
[yyx@hadoop01 conf]$ vim sqoop-env.sh 

export HADOOP_COMMON_HOME=/opt/module/hadoop-2.7.2
export HADOOP_MAPRED_HOME=/opt/module/hadoop-2.7.2
export HIVE_HOME=/opt/module/hive
export ZOOKEEPER_HOME=/opt/module/zookeeper-3.4.10
export ZOOCFGDIR=/opt/module/zookeeper-3.4.10
export HBASE_HOME=/opt/module/hbase

  1. 拷贝JDBC驱动
[yyx@hadoop01 mysql-connector-java-5.1.27]$ ll
总用量 1264
-rw-r--r--. 1 root root  47173 1024 2013 build.xml
-rw-r--r--. 1 root root 222520 1024 2013 CHANGES
-rw-r--r--. 1 root root  18122 1024 2013 COPYING
drwxr-xr-x. 2 root root     71 24 17:18 docs
-rw-r--r--. 1 root root 872303 1024 2013 mysql-connector-java-5.1.27-bin.jar
-rw-r--r--. 1 root root  61423 1024 2013 README
-rw-r--r--. 1 root root  63674 1024 2013 README.txt
drwxr-xr-x. 7 root root     67 1024 2013 src
[yyx@hadoop01 mysql-connector-java-5.1.27]$ cp mysql-connector-java-5.1.27-bin.jar /opt/module/sqoop/lib/

4.验证

[yyx@hadoop01 sqoop]$ bin/sqoop help
Warning: /opt/module/sqoop/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/module/sqoop/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
21/03/26 16:47:39 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
usage: sqoop COMMAND [ARGS]

Available commands:
  codegen            Generate code to interact with database records
  create-hive-table  Import a table definition into Hive
  eval               Evaluate a SQL statement and display the results
  export             Export an HDFS directory to a database table
  help               List available commands
  import             Import a table from a database to HDFS
  import-all-tables  Import tables from a database to HDFS
  import-mainframe   Import datasets from a mainframe server to HDFS
  job                Work with saved jobs
  list-databases     List available databases on a server
  list-tables        List available tables in a database
  merge              Merge results of incremental imports
  metastore          Run a standalone Sqoop metastore
  version            Display version information

See 'sqoop help COMMAND' for information on a specific command.

成功!!!
测试连接数据库:

[yyx@hadoop01 sqoop]$  bin/sqoop list-databases --connect jdbc:mysql://hadoop01:3306/ --username root --password 000000
Warning: /opt/module/sqoop/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/module/sqoop/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
21/03/26 16:48:29 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
21/03/26 16:48:29 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
21/03/26 16:48:29 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
information_schema
metastore
mysql
performance_schema
test

简单案例

导入数据

从RDBMS向大数据集群(HDFS、HIVE、HBASE)中传输数据,用导入:import

从RDBMS到HDFS

mysel现有一张表:

mysql> select * from teacher;
+-------------+------+
| name        | sum  |
+-------------+------+
| chenfei     | 2    |
| ddddd       | 1    |
| lihua       | 4    |
| rongzhuqing | 3    |
+-------------+------+
4 rows in set (0.00 sec)

全部导入HDFS

创建好导入位置
[yyx@hadoop01 hadoop-2.7.2]$ hdfs dfs mkdir /sqoop/test/
插入:

bin/sqoop import \
--connect jdbc:mysql://hadoop01:3306/test \
--username root \
--password 000000 \
--table teacher \
--target-dir /sqoop/test \
--delete-target-dir \
--num-mappers 1 \
--fields-terminated-by "\t"

在这里插入图片描述
在这里插入图片描述

查询导入
bin/sqoop import \
--connect jdbc:mysql://hadoop01:3306/test \
--username root \
--password 000000 \
--target-dir /sqoop/test \
--delete-target-dir \
--num-mappers 1 \
--fields-terminated-by "," \
--query 'select name,sum from teacher where sum <=5 and $CONDITIONS;'

–delete-target-dir \表示如果存在文件,删除后添加
$CONDITIONS一定要加,里面封装了变量,否则会报错
在这里插入图片描述

指定列导入
bin/sqoop import \
--connect jdbc:mysql://hadoop01:3306/test \
--username root \
--password 000000 \
--target-dir /sqoop/test \
--delete-target-dir \
--num-mappers 1 \
--fields-terminated-by "\t" \
--columns name \
--table teacher

在这里插入图片描述

使用sqoop关键字筛选查询导入数据
bin/sqoop import \
--connect jdbc:mysql://hadoop01:3306/test \
--username root \
--password 000000 \
--target-dir /sqoop/test \
--delete-target-dir \
--num-mappers 1 \
--fields-terminated-by "\t" \
--table teacher \
--where "sum<5"

在这里插入图片描述

RDBMS到HIVE
bin/sqoop import \
--connect jdbc:mysql://hadoop01:3306/test \
--username root \
--password 000000 \
--table teacher \
--num-mappers 1 \
--hive-import \
--fields-terminated-by "\t" \
--hive-overwrite \
--hive-table countteacher

可以实现不创建表

0: jdbc:hive2://hadoop01:10000> show tables;
+---------------+--+
|   tab_name    |
+---------------+--+
| countteacher  |
| student       |
| test          |
+---------------+--+
3 rows selected (0.37 seconds)
0: jdbc:hive2://hadoop01:10000> select * from countteacher
0: jdbc:hive2://hadoop01:10000> ;
+--------------------+-------------------+--+
| countteacher.name  | countteacher.sum  |
+--------------------+-------------------+--+
| chenfei            | 2                 |
| ddddd              | 1                 |
| lihua              | 4                 |
| rongzhuqing        | 3                 |
+--------------------+-------------------+--+
4 rows selected (1.559 seconds)

第一步是先将数据导入到HDFS,第二部才是将HDFS中数据导入到HIVE

RDBMS到HBase
bin/sqoop import \
--connect jdbc:mysql://hadoop01:3306/test \
--username root \
--password 000000 \
--table teacher \
--columns "name,sum" \
--column-family "info" \
--hbase-create-table \
--hbase-row-key "sum" \
--hbase-table "hbase_teacher" \
--num-mappers 1 \
--split-by name
hbase(main):001:0> scan 'hbase_teacher'
ROW                        COLUMN+CELL                                                                 
 1                         column=info:name, timestamp=1616811676595, value=ddddd                      
 2                         column=info:name, timestamp=1616811676595, value=chenfei                    
 3                         column=info:name, timestamp=1616811676595, value=rongzhuqing                
 4                         column=info:name, timestamp=1616811676595, value=lihua                      
4 row(s) in 0.2690 seconds

数据导出

从HDFS、HIVE、HBASE向RDBMS中传输数据,使用export关键字

Hive 到 RDBMS

 bin/sqoop export \
--connect jdbc:mysql://hadoop01:3306/company \
--username root \
--password 000000 \
--table teacher \
--num-mappers 1 \
--export-dir /user/hive/warehouse/staff_hive \
--input-fields-terminated-by "\t"

表不存在,mysql不会自动创建

脚本

用opt格式文件打包sqoop命令
1.创建一个.opt文件

$ mkdir opt
$ touch opt/job_HDFS2RDBMS.opt

2.编写脚本

$ vi opt/job_HDFS2RDBMS.opt

export
--connect
jdbc:mysql://hadoop01:3306/test
--username
root
--password
000000
--table
teacher
--num-mappers
1
--export-dir
/user/hive/warehouse/staff_hive
--input-fields-terminated-by
"\t"

执行:

 bin/sqoop --options-file opt/job_HDFS2RDBMS.opt

常用参数

在这里插入图片描述
参数:
在这里插入图片描述
命令参数:
在这里插入图片描述
在这里插入图片描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值