一、Sqoop简介
二、Sqoop下载、解压
三、目录解读、配置环境变量
四、添加MySQL驱动、配置sqoop-env.sh
五、查看sqoop命令帮助
一、Sqoop简介
-
Apache Sqoop™ is a tool designed for efficiently transferring bulk data(一种为有效传输大量数据设计出来的工具) between Apache Hadoop and structured datastores such as relational databases)(比如说我们的关系型数据库和Hadoop).
-
Sqoop successfully graduated from the Incubator(孵化版) in March of 2012 and is now a Top-Level Apache Project(已经成为了Apache的顶级项目):More Information
-
Latest stable release is 1.4.7 (download, documentation). Latest cut of Sqoop2 is 1.99.7.Note that 1.99.7 is not compatible with 1.4.7 and not feature complete, it is not intended for production deployment.
小结:
- Sqoop就是Hadoop和关系型数据库之间的数据交换的一个桥梁,本质是运行一个Map Reduce作业把我们关系型数据库中的数据读出来写到目的地中去(Hive、Hbase)
问题:Sqoop能否把关系数据库的数据导入Hive、Hbase?
-
答:只要数据存储在hdfs上之间都是可以的。Hive就是存储在hdfs上的。
-
Sqoop:1.4.7 version:1 生产上一般使用1
-
Sqoop: 1.99.7 version:2
二、Sqoop下载、解压
1、下载地址
- 在如下网址中找到本次使用的版本sqoop-1.4.6-cdh5.7.0:http://archive.cloudera.com/cdh5/cdh/5/
2、 下载并解压sqoop-1.4.6-cdh5.7.0.tar.gz
1、使用wget下载:
wget http://archive.cloudera.com/cdh5/cdh/5/sqoop-1.4.6-cdh5.7.0.tar.gz
2、解压:
- tar -xzvf /sqoop-1.4.6-cdh5.7.0.tar.gz
注意:用户和用户组
三、Sqoop目录解读、配置环境变量
1、Sqoop文件目录结构:
[hadoop@hadoop004 sqoop-1.4.6-cdh5.7.0]$ ll
total 1880
drwxr-xr-x 2 hadoop hadoop 4096 Sep 12 14:35 bin //存放脚本
drwxr-xr-x 2 hadoop hadoop 4096 Sep 12 14:35 conf //配置文件
drwxr-xr-x 5 hadoop hadoop 4096 Sep 12 14:35 docs //文档
drwxr-xr-x 2 hadoop hadoop 4096 Sep 12 14:35 lib //lib依赖包
-rw-rw-r-- 1 hadoop hadoop 1035867 Mar 24 2016 sqoop-1.4.6-cdh5.7.0.jar //完整jar包
-rw-rw-r-- 1 hadoop hadoop 656201 Mar 24 2016 sqoop-test-1.4.6-cdh5.7.0.jar //测试jar包
drwxr-xr-x 7 hadoop hadoop 4096 Mar 24 2016 src //源码
2、 配置环境变量并生效
1、vi ~/.bash_profile
export SQOOP_HOME=/home/hadoop/app/sqoop-1.4.6-cdh5.7.0
export PATH=$SQOOP_HOME/bin:$PATH
2、生效环境变量:
source ~/.bash_profile
四、添加MySQL驱动、配置sqoop-env.sh
1、注意$SQOOP_HOME/lib目录下要有mysql驱动
[hadoop@hadoop004 lib]$ ll mysql-connector-java-5.1.46-bin.jar
-rw-r--r-- 1 hadoop hadoop 1004840 Feb 26 2018 mysql-connector-java-5.1.46-bin.jar
2、 配置sqoop-env.sh
- [hadoop@hadoop000 conf]$ cp sqoop-env-template.sh sqoop-env.sh
#Set path to where bin/hadoop is available
#export HADOOP_COMMON_HOME=/home/hadoop/app/hadoop-2.6.0-cdh5.7.0
#Set path to where hadoop-*-core.jar is available
#export HADOOP_MAPRED_HOME=/home/hadoop/app/hadoop-2.6.0-cdh5.7.0
#Set the path to where bin/hive is available
#export HIVE_HOME=/home/hadoop/app/hive
五、查看Sqoop命令帮助
- sqoop help
Available commands:
codegen Generate code to interact with database records
create-hive-table Import a table definition into Hive
eval Evaluate a SQL statement and display the results
export Export an HDFS directory to a database table
help List available commands
import Import a table from a database to HDFS
import-all-tables Import tables from a database to HDFS
import-mainframe Import datasets from a mainframe server to HDFS
job Work with saved jobs
list-databases List available databases on a server
list-tables List available tables in a database
merge Merge results of incremental imports
metastore Run a standalone Sqoop metastore
version Display version information
注意:
命令帮助中的导入、导出的出发点:
导入(import):MySQL --> Hadoop
导出(export):Hadoop --> MySQL
1、更详细的import命令帮助:
-
sqoop import --help
-
用法:usage: sqoop import [GENERIC-ARGS] [TOOL-ARGS]
如果数据是从MySQL中导入到HDFS中:
1、目的地目录我们得知道,从MySQL中出来,url、password这些都是需要的
2、Hive arguments:
命令 | 描述 |
---|---|
–create-hive-table | Fail if the target hive table exists |
–hive-database | Sets the database name to use when importing to hive |
–hive-delims-replacement | Replace Hive record \0x01 and row delimiters (\n\r) from imported string fields with user-defined string |
–hive-drop-import-delims | Drop Hive record \0x01 and row delimiters (\n\r) from imported string fields |
–hive-home
| Override $HIVE_HOME |
–hive-import | Import tables into Hive |