一、sqoop简介
Sqoop是一款开源的工具,主要用于在Hadoop(Hive)与传统的数据库(mysql、postgresql…)间进行数据的传递,可以将一个关系型数据库(例如 : MySQL ,Oracle ,Postgres等)中的数据导进到Hadoop的HDFS中,也可以将HDFS的数据导进到关系型数据库中。
说明:本测试hadoop是单机伪分布式环境,如果读者想要学习如何搭建伪分布式hadoop环境请看本人这篇文章([Hadoop+Hive+HBase+Kylin 伪分布式安装指南(https://blog.csdn.net/qq_28356739/article/details/88569175)),本文章只介绍如何安装部署sqoop,后续通过sqoop将Oracle数据导入到hadoop环境实验在另外一篇文章,待测试完将会把连接放到此文章。
二、环境配置
三、安装Sqoop
1. 下载,解压到指定目录
下载连接:
点此下载
创建安装目录,通过xshell上传安装包
[root@hadoop hadoop]# pwd
/hadoop
[root@hadoop hadoop]# mkdir sqoop
[root@hadoop hadoop]# cd sqoop/
[root@hadoop sqoop]# ls
sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
[root@hadoop sqoop]# tar -zxvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
[root@hadoop sqoop]# ls
sqoop-1.4.7.bin__hadoop-2.6.0 sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
[root@hadoop sqoop]# rm -rf *gz
[root@hadoop sqoop]# mv sqoop-1.4.7.bin__hadoop-2.6.0/* .
2、修改配置文件sqoop-env.sh
在sqoop/conf目录下有一个文件sqoop-env-template.sh,把它复制为sqoop-env.sh并修改
[root@hadoop sqoop]# cd conf/
[root@hadoop conf]# cp sqoop-env-template.sh sqoop-env.sh
#Set path to where bin/hadoop is available
[root@hadoop conf]# vim sqoop-env.sh
根据自己情况修改,另外,你还装了Zookeeper的话则最后一句也要配置。
export HADOOP_COMMON_HOME=/hadoop/
#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/hadoop/
#set the path to where bin/hbase is available
export HBASE_HOME=/hadoop/hbase/
#Set the path to where bin/hive is available
export HIVE_HOME=/hadoop/hive
#Set the path for where zookeper config dir is
#export ZOOCFGDIR=
3. 配置环境变量
我测试用户为root用户,直接修改/etc/profile加入下面内容:
export SQOOP_HOME=/hadoop/sqoop
export PATH=$PATH:${SQOOP_HOME}/bin
export CLASSPATH=$CLASSPATH:${SQOOP_HOME}/lib
然后使环境变量生效
[root@hadoop conf]# source /etc/profile
4. 复制相关依赖包到$SQOOP_HOME/lib
因为我是将Oracle数据导入到hive,所以复制环境数据库所在虚拟机(195.168.1.6)的Oracle的OJDBC包到/hadoop/sqoop/lib下
[oracle@source ~]$ cd $ORACLE_HOME/jdbc/lib
[oracle@source lib]$ pwd
/u01/app/oracle/product/11.2.0/db_1/jdbc/lib
[oracle@source lib]$ ls
ojdbc5dms_g.jar ojdbc5_g.jar ojdbc6dms_g.jar ojdbc6_g.jar simplefan.jar
ojdbc5dms.jar ojdbc5.jar ojdbc6dms.jar ojdbc6.jar
上面是数据库所在虚拟机Oraclejar包位置及信息。将ojdbc包传到hadoop虚拟机
[oracle@source lib]$ scp ojdbc6.jar root@192.168.1.66:/hadoop/sqoop/lib
root@192.168.1.66's password:
ojdbc6.jar 100% 2675KB 2.6MB/s 00:00
5、修改$SQOOP_HOME/bin/configure-sqoop
注释掉HCatalog,Accumulo检查(除非你准备使用HCatalog,Accumulo等HADOOP上的组件)
##Moved to be a runtime check in sqoop.
#if[ ! -d "${HCAT_HOME}" ]; then
# echo "Warning: $HCAT_HOME does notexist! HCatalog jobs will fail."
# echo 'Please set $HCAT_HOME to the root ofyour HCatalog installation.'
#fi
#if[ ! -d "${ACCUMULO_HOME}" ]; then
# echo "Warning: $ACCUMULO_HOME does notexist! Accumulo imports will fail."
# echo 'Please set $ACCUMULO_HOME to the rootof your Accumulo installation.'
#fi
#Add HCatalog to dependency list
#if[ -e "${HCAT_HOME}/bin/hcat" ]; then
# TMP_SQOOP_CLASSPATH=${SQOOP_CLASSPATH}:`${HCAT_HOME}/bin/hcat-classpath`
# if [ -z "${HIVE_CONF_DIR}" ]; then
# TMP_SQOOP_CLASSPATH=${TMP_SQOOP_CLASSPATH}:${HIVE_CONF_DIR}
# fi
# SQOOP_CLASSPATH=${TMP_SQOOP_CLASSPATH}
#fi
#Add Accumulo to dependency list
#if[ -e "$ACCUMULO_HOME/bin/accumulo" ]; then
# for jn in `$ACCUMULO_HOME/bin/accumuloclasspath | grep file:.*accumulo.*jar |cut -d':' -f2`; do
# SQOOP_CLASSPATH=$SQOOP_CLASSPATH:$jn
# done
# for jn in `$ACCUMULO_HOME/bin/accumuloclasspath | grep file:.*zookeeper.*jar |cut -d':' -f2`; do
# SQOOP_CLASSPATH=$SQOOP_CLASSPATH:$jn
# done
#fi
6、 测试与Oracle的连接
[root@hadoop sqoop]# pwd
/hadoop/sqoop
[root@hadoop sqoop]# sqoop list-databases --connect jdbc:oracle:thin:@192.168.1.6:1521:orcl --username scott --password tiger
Warning: /hadoop/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
19/03/18 14:25:57 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
19/03/18 14:25:57 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consi
der using -P instead.19/03/18 14:25:57 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
19/03/18 14:25:57 INFO manager.SqlManager: Using default fetchSize of 1000
19/03/18 14:25:58 INFO manager.OracleManager: Time zone has been set to GMT
SYS
SYSTEM
SCOTT
TEST
ADMRG
OGG
OUTLN
MGMT_VIEW
FLOWS_FILES
MDSYS
ORDSYS
EXFSYS
DBSNMP
WMSYS
APPQOSSYS
APEX_030200
OWBSYS_AUDIT
ORDDATA
CTXSYS
ANONYMOUS
SYSMAN
XDB
ORDPLUGINS
OWBSYS
SI_INFORMTN_SCHEMA
OLAPSYS
ORACLE_OCM
XS$NULL
BI
PM
MDDATA
IX
SH
DIP
OE
APEX_PUBLIC_USER
HR
SPATIAL_CSW_ADMIN_USR
SPATIAL_WFS_ADMIN_USR