一、sqoop是什么
Sqoop(发音:skup)是一款开源的工具,主要用于在Hadoop(Hive)与传统的数据库(mysql、postgresql...)间进行数据的传递,可以将一个关系型数据库(例如 : MySQL ,Oracle ,Postgres等)中的数据导进到Hadoop的HDFS中,也可以将HDFS的数据导进到关系型数据库中。
二、sqoop的安装
1.下载sqoop安装包地址:[sqoop下载地址](http://mirrors.hust.edu.cn/apache/sqoop/1.4.6/)
2.将下载的安装包sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz上传至Linux操作系统
3.解压文件到指定目录
例如: $ tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz -C /opt/module
4.修改配置文件
·修改sqoop目录下conf文件下文件sqoop-env-template.sh将文件重命名为sqoop-env.sh
$ mv sqoop-env-template.sh sqoop-env.sh
这个配置是指定hadoop的安装目录
export HADOOP_COMMON_HOME=/home/admin/modules/hadoop-2.7.2
这个配置的是mapred的安装目录也就是hadoop的安装目录
export HADOOP_MAPRED_HOME=/home/admin/modules/hadoop-2.7.2
这里指定的是hive的安装目录
export HIVE_HOME=/home/admin/modules/apache-hive-1.2.2-bin
这里指定的是zookeeper的安装目录
export ZOOKEEPER_HOME=/home/admin/modules/zookeeper-3.4.5
这里指定的是zookeeper的conf文件夹目录
export ZOOCFGDIR=/home/admin/modules/zookeeper-3.4.5
三、拷贝数据库连接驱动包
将连接mysql的驱动包mysql-connector-java-5.1.27-bin.jar拷贝到sqoop的lib文件夹下
例如:$ cp mysql-connector-java-5.1.27-bin.jar /opt/module/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/lib
四、验证sqoop配置是否成功
·Linux转到sqoop的安装目录指定命令
$ bin/sqoop help
如果出现以下信息则表示sqoop配置成功
Warning: /opt/module/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /opt/module/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/module/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
17/12/27 20:34:31 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
usage: sqoop COMMAND [ARGS]
Available commands:
codegen Generate code to interact with database records
create-hive-table Import a table definition into Hive
eval Evaluate a SQL statement and display the results
export Export an HDFS directory to a database table
help List available commands
import Import a table from a database to HDFS
import-all-tables Import tables from a database to HDFS
import-mainframe Import datasets from a mainframe server to HDFS
job Work with saved jobs
list-databases List available databases on a server
list-tables List available tables in a database
merge Merge results of incremental imports
metastore Run a standalone Sqoop metastore
version Display version information
See 'sqoop help COMMAND' for information on a specific command.
·测试sqoop是否能连接到mysql数据库
$ bin/sqoop list-databases \
--connect jdbc:mysql://hadoop102:3306 \
--username root \
--password 123456
·能列出查询出对应数据库列表则表示配置成功
Warning: /opt/module/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /opt/module/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/module/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
17/12/27 20:42:10 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
17/12/27 20:42:10 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/12/27 20:42:10 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
information_schema
company
metastore
mysql
performance_schema
test
五、sqoop使用案例
1.将mysql关系型数据库中的数据导入到hdfs文件系统中
$ bin/sqoop import \
--connect jdbc:mysql://hadoop102:3306/company \
--username root \
--password 123456 \
--table staff \
--target-dir /user/company \
--delete-target-dir \
--num-mappers 1 \
--fields-terminated-by "\t"
2.mysql关系型数据库中的数据导入到hive表中
$ bin/sqoop import \
--connect jdbc:mysql://hadoop102:3306/company \
--username root \
--password 123456 \
--table staff \
--num-mappers 1 \
--hive-import \
--fields-terminated-by "\t" \
--hive-overwrite \
--hive-table staff_hive \
3.将hive表中的数据导入到mysql中
$bin/sqoop export \
-connect jdbc:mysql://hadoop102:3306/company \
--username root \
--password 123456 \
--table staff \
--num-mappers 1 \
--export-dir /user/hive/warehouse/staff_hive \
--input-fields-terminated-by "\t"