下载
sqoop1:1.4.7
sqoop2:1.99.7
版本区别请参考《sqoop简介及sqoop1与sqoop2区别》
我们需要下载sqoop-1.99.7-bin-hadoop200.tar.gz
确保Hadoop已安装,未安装请参考《centos下Hadoop集群安装》
安装配置
1. 解压
tar -zxvf sqoop-1.99.7-bin-hadoop200.tar.gz -C /root/
mv sqoop-1.99.7 sqoop
2. 环境变量配置
2.1 编辑/etc/profile
export SQOOP_HOME=/root/sqoop
export LOGDIR=$SQOOP_HOME/logs
export PATH=$PATH:$JAVA_HOME/bin:$SQOOP_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin:$HBASE_HOME/sbin:$HIVE_HOME/bin
export SQOOP_SERVER_EXTRA_LIB=/root/sqoop/server/lib
source /etc/profile
使之生效
2.2 编辑/sqoop/conf/sqoop.properties
# JDBC repository provider configuration
#org.apache.sqoop.repository.jdbc.handler=org.apache.sqoop.repository.derby.DerbyRepositoryHandler
org.apache.sqoop.repository.jdbc.handler=org.apache.sqoop.repository.mysql.MySqlRepositoryHandler
org.apache.sqoop.repository.jdbc.transaction.isolation=READ_COMMITTED
org.apache.sqoop.repository.jdbc.maximum.connections=10
#org.apache.sqoop.repository.jdbc.url=jdbc:derby:@BASEDIR@/repository/db;create=true
#我的是MySQL数据库
org.apache.sqoop.repository.jdbc.url=jdbc:mysql://master:3306/sqoop
#org.apache.sqoop.repository.jdbc.driver=org.apache.derby.jdbc.EmbeddedDriver
org.apache.sqoop.repository.jdbc.driver=com.mysql.jdbc.Driver
org.apache.sqoop.repository.jdbc.user=root
#记得修改为自己数据库的密码
org.apache.sqoop.repository.jdbc.password=yourpassword
# Hadoop configuration directory
org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/root/hadoop-2.10.0/etc/hadoop
org.apache.sqoop.jetty.port=12000
2.3 修改Hadoop的yarn-site.xml
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
2.4 修改core-site.xml
并重启Hadoop
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
2.5 需要安装JDBC驱动,将JDBC驱动放入server/lib/目录下。
我安装的是MySQL,需要先确定好mysql版本,
mysql> select version();
+-----------+
| version() |
+-----------+
| 5.6.46 |
+-----------+
1 row in set (0.01 sec)
再下载对应的mysql驱动放入到server/lib/目录下,下载地址:https://repo1.maven.org/maven2/mysql/mysql-connector-java/,不过我已经安装了HIVE,直接将hive目录下的mysql-connector-java-5.1.46.jar
复制过来即可
cp /root/hive-3.1.2/lib/mysql-connector-java-5.1.46.jar /root/sqoop/server/conf
3.初始化和验证
3.1 sqoop初始化
[root@master bin]# ./sqoop2-tool upgrade
Setting conf dir: /root/sqoop/bin/../conf
Sqoop home directory: /root/sqoop
Sqoop tool executor:
Version: 1.99.7
Revision: 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb
Compiled on Tue Jul 19 16:08:27 PDT 2016 by abefine
Running tool: class org.apache.sqoop.tools.tool.UpgradeTool
2020-01-28 21:56:19,969 INFO [main] core.PropertiesConfigurationProvider (PropertiesConfigurationProvider.java:initialize(99)) - Starting config file poller thread
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/root/sqoop/server/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/hadoop-2.10.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/hive-3.1.2/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
Tool class org.apache.sqoop.tools.tool.UpgradeTool has finished correctly.
3.2 sqoop验证
[root@master bin]# ./sqoop2-tool verify
Setting conf dir: /root/sqoop/bin/../conf
Sqoop home directory: /root/sqoop
Sqoop tool executor:
Version: 1.99.7
Revision: 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb
Compiled on Tue Jul 19 16:08:27 PDT 2016 by abefine
Running tool: class org.apache.sqoop.tools.tool.VerifyTool
2020-01-28 21:56:35,082 INFO [main] core.SqoopServer (SqoopServer.java:initialize(55)) - Initializing Sqoop server.
2020-01-28 21:56:35,089 INFO [main] core.PropertiesConfigurationProvider (PropertiesConfigurationProvider.java:initialize(99)) - Starting config file poller thread
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/root/sqoop/server/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/hadoop-2.10.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/hive-3.1.2/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
Verification was successful.
Tool class org.apache.sqoop.tools.tool.VerifyTool has finished correctly.
3.2.1 异常处理
ERROR[main] org.apache.sqoop.repository.common.CommonRepositoryHandler - Can't execute query: CREATE DATABASE IF NOT EXISTS"SQOOP"
在mysql中执行
#此步骤每次MySQL重启均需重新配置
set global sql_mode ='ANSI_QUOTES';
或者在mysql的配置文档my-default.cnf中添加
#此步骤永久配置
sql_mode=ANSI_QUOTES
4 测试访问
4.1 启动sqoop
4.1.1 启动historyserver这个进程
#在hadoop的sbin目录下执行如下命令
# /root/hadoop-2.10.0/etc/hadoop 是我的hadoop存放配置文件的目录
./mr-jobhistory-daemon.sh start historyserver --config /root/hadoop-2.10.0/etc/hadoop
4.1.2 启动sqoop服务端
[root@master bin]# ./sqoop2-server start
Setting conf dir: /root/sqoop/bin/../conf
Sqoop home directory: /root/sqoop
Starting the Sqoop2 server...
2020-01-28 07:44:00,338 INFO [main] core.SqoopServer (SqoopServer.java:initialize(55)) - Initializing Sqoop server.
2020-01-28 07:44:00,412 INFO [main] core.PropertiesConfigurationProvider (PropertiesConfigurationProvider.java:initialize(99)) - Starting config file poller thread
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/root/sqoop/server/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/hadoop-2.10.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/hive-3.1.2/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
Sqoop2 server started.
4.1.3 启动sqoop客户端
[root@master bin]# ./sqoop-shell
bash: ./sqoop-shell: No such file or directory
[root@master bin]# ./sqoop2-shell
Setting conf dir: /root/sqoop/bin/../conf
Sqoop home directory: /root/sqoop
Sqoop Shell: Type 'help' or '\h' for help.
sqoop:000>
4.2 sqoop测试使用
将MySQL数据通过sqool工具传输到hdfs上
4.2.1 连接sqoop服务端
sqoop:000> set server --host master --port 12000 -webapp sqoop
Server is set successfully
4.2.2 检查Sqoop服务(server)已经注册的 connectors:
sqoop:000> show connector
+------------------------+---------+------------------------------------------------------------+----------------------+
| Name | Version | Class | Supported Directions |
+------------------------+---------+------------------------------------------------------------+----------------------+
| generic-jdbc-connector | 1.99.7 | org.apache.sqoop.connector.jdbc.GenericJdbcConnector | FROM/TO |
| kite-connector | 1.99.7 | org.apache.sqoop.connector.kite.KiteConnector | FROM/TO |
| oracle-jdbc-connector | 1.99.7 | org.apache.sqoop.connector.jdbc.oracle.OracleJdbcConnector | FROM/TO |
| ftp-connector | 1.99.7 | org.apache.sqoop.connector.ftp.FtpConnector | TO |
| hdfs-connector | 1.99.7 | org.apache.sqoop.connector.hdfs.HdfsConnector | FROM/TO |
| kafka-connector | 1.99.7 | org.apache.sqoop.connector.kafka.KafkaConnector | TO |
| sftp-connector | 1.99.7 | org.apache.sqoop.connector.sftp.SftpConnector | TO |
+------------------------+---------+------------------------------------------------------------+----------------------+
4.2.3 创建连接Mysql导入的link
sqoop:000> create link -c generic-jdbc-connector
Creating link for connector with name generic-jdbc-connector
Please fill following values to create new link object
Name: link1_generic_jdbc
Database connection
Driver class: com.mysql.jdbc.Driver
Connection String: jdbc:mysql://master/sqoop #确保有sqoop这个数据库
Username: root
Password: ********* #与MYSQL密码一致
Fetch Size:
Connection Properties:
There are currently 0 values in the map:
entry#
SQL Dialect
Identifier enclose: (此处为空格)
New link was successfully created with validation status OK and name link1_generic_jdbc
4.2.4 查看link情况
sqoop:000> show link
+--------------------+------------------------+---------+
| Name | Connector Name | Enabled |
+--------------------+------------------------+---------+
| link1_generic_jdbc | generic-jdbc-connector | true |
+--------------------+------------------------+---------+
4.2.5 创建导出到hdfs的link
sqoop:000> create link -c hdfs-connector
Creating link for connector with name hdfs-connector
Please fill following values to create new link object
Name: link2_hdfs
HDFS cluster
URI: hdfs://master:9000
Conf directory: /root/hadoop-2.10.0/etc/hadoop
Additional configs::
There are currently 0 values in the map:
entry#
New link was successfully created with validation status OK and name link2_hdfs
4.2.6 查看已创建的link具体状态
sqoop:000> show link --all
2 link(s) to show:
link with name link1_generic_jdbc (Enabled: true, Created by root at 1/28/20 8:13 PM, Updated by root at 1/28/20 8:13 PM)
Using Connector generic-jdbc-connector with name {1}
Database connection
Driver class: com.mysql.jdbc.Driver
Connection String: jdbc:mysql://master/sqoop
Username: root
Password:
Fetch Size:
Connection Properties:
SQL Dialect
Identifier enclose:
link with name link2_hdfs (Enabled: true, Created by root at 1/28/20 8:14 PM, Updated by root at 1/28/20 8:14 PM)
Using Connector hdfs-connector with name {1}
HDFS cluster
URI: hdfs://master:9000
Conf directory: /root/hadoop-2.10.0/etc/hadoop
Additional configs::
4.2.7 创建数据从mysql 转移到hdfs 的job :
sqoop:000> create job -f "link1_generic_jdbc" -t "link2_hdfs"
Creating job for links with from name link1_generic_jdbc and to name link2_hdfs
Please fill following values to create new job object
Name: jdbcToHdfs
Database source
#确保有sqoop这个数据库
Schema name: sqoop
#user表即需要导入到hdfs库的数据
Table name: user
SQL statement:
Column names:
There are currently 0 values in the list:
element#
Partition column:
Partition column nullable:
Boundary query:
Incremental read
Check column:
Last value:
Target configuration
Override null value:
Null value:
File format:
0 : TEXT_FILE
1 : SEQUENCE_FILE
2 : PARQUET_FILE
Choose: 0
Compression codec:
0 : NONE
1 : DEFAULT
2 : DEFLATE
3 : GZIP
4 : BZIP2
5 : LZO
6 : LZ4
7 : SNAPPY
8 : CUSTOM
Choose: 0
Custom codec:
Output directory: /jdbctohdfs #确保hdfs无此文件夹,或此文件夹为空
Append mode:
Throttling resources
Extractors:
Loaders:
Classpath configuration
Extra mapper jars:
There are currently 0 values in the list:
element#
New job was successfully created with validation status OK and name jdbcToHdfs
4.2.8 查看job状态
sqoop:000> show job --all
1 job(s) to show:
Job with name jdbcToHdfs (Enabled: true, Created by root at 1/28/20 8:17 PM, Updated by root at 1/28/20 8:17 PM)
Throttling resources
Extractors:
Loaders:
Classpath configuration
Extra mapper jars:
From link: link1_generic_jdbc
Database source
Schema name: sqoop
Table name: user
SQL statement:
Column names:
Partition column:
Partition column nullable:
Boundary query:
Incremental read
Check column:
Last value:
To link: link2_hdfs
Target configuration
Override null value:
Null value:
File format: TEXT_FILE
Compression codec: NONE
Custom codec:
Output directory: /jdbctohdfs
Append mode:
4.2.9 启动job
sqoop:000> start job -n jdbcToHdfs
2020-01-28 21:44:57 PST: FAILED
运行失败,查看sqoop.log
分析原因
按理说应该在sqoop/server/lib
文件中查找对应jar包,这里显示的是sqoop/bin下查找的,未弄清需要在哪里修改配置,直接把sqoop/server/lib下的jar软连接过去
[root@master bin]# ln -s /root/sqoop/server/lib/* /root/sqoop/bin
查看/root/sqoop/bin
下,已有jar包软连接
再次启动,成功
sqoop:000> start job -n jdbcToHdfs
Submission details
Job Name: jdbcToHdfs
Server URL: http://master:12000/sqoop/
Created by: root
Creation date: 2020-01-28 21:58:31 PST
Lastly updated by: root
External ID: job_local292485477_0006
http://localhost:8080/
Source Connector schema: Schema{name= sqoop . website ,columns=[
FixedPoint{name=websiteid,nullable=true,type=FIXED_POINT,byteSize=8,signed=true},
Text{name=companyName,nullable=true,type=TEXT,charSize=null},
Text{name=companyAddr,nullable=true,type=TEXT,charSize=null},
Text{name=companyInfo,nullable=true,type=TEXT,charSize=null},
Text{name=jobInfo,nullable=true,type=TEXT,charSize=null},
Text{name=jobAddr,nullable=true,type=TEXT,charSize=null},
Text{name=jobName,nullable=true,type=TEXT,charSize=null},
Text{name=salary,nullable=true,type=TEXT,charSize=null}]}
2020-01-28 21:58:34 PST: SUCCEEDED
在web上查看HDFS数据上传成功