1.搭建sqoop
下载关于sqoop的包,以hadoop解压到对应的目录
配置环境变量
export SQOOP_HOME=/home/hadoop/app/sqoop
export PATH=${SQOOP_HOME}/bin:$PATH
修改配置文件,需要先拷贝一份模板文件,只需要修改hadoop跟hive的路径就可以
[hadoop@hadoop001 conf]$ cp sqoop-env-template.sh sqoop-env.sh
[hadoop@hadoop001 conf]$ cat sqoop-env.sh
……
#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/home/hadoop/app/hadoop
#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/home/hadoop/app/hadoop
#set the path to where bin/hbase is available
#export HBASE_HOME=
#Set the path to where bin/hive is available
export HIVE_HOME=/home/hadoop/app/hive
#Set the path for where zookeper config dir is
#export ZOOCFGDIR=
上传文件到$SQOOP_HOME/lib目录下,防止后续报错
[hadoop@hadoop001 lib]$ ll mysql-connector-java-5.1.47.jar
-rw-rw-r-- 1 hadoop hadoop 1007502 Jul 23 09:45 mysql-connector-java-5.1.47.jar
[hadoop@hadoop001 lib]$ ll java-json.jar
-rw-r--r-- 1 hadoop hadoop 84697 Jul 23 14:51 java-json.jar
接下来可以使用sqoop进行文件的导入导出,(导入导出基于hadoop)
2.sqoop参数介绍
通过help来查看sqoop的帮助信息
[hadoop@hadoop001 lib]$ sqoop help
Warning: /home/hadoop/app/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/hadoop/app/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hadoop/app/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/hadoop/app/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
19/07/23 20:10:03 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.15.1
usage: sqoop COMMAND [ARGS]
Available commands:
codegen Generate code to interact with database records
create-hive-table Import a table definition into Hive
eval Evaluate a SQL statement and display the results
export Export an HDFS directory to a database table 常用参数
help List available commands
import Import a table from a database to HDFS 常用参数
import-all-tables Import tables from a database to HDFS
import-mainframe Import datasets from a mainframe server to HDFS
job Work with saved jobs
list-databases List available databases on a server
list-tables List available tables in a database
merge Merge results of incremental imports
metastore Run a standalone Sqoop metastore
version Display version information
See 'sqoop help COMMAND' for information on a specific command.
接下里重点介绍sqoop的导入导出功能
sqoop help import 查看导入帮助
--connect <jdbc-uri> 指定连接数据库的地址 Specify JDBC
connect
string
--username <username> 数据库的用户名 Set
authenticati
on username
--password <password> 数据库密码 Set
authenticati
on password
--columns <col,col,col...> 单独导出表的字段 Columns to
import from
table
--delete-target-dir 删除导入的对应的目录 Imports data
in delete
mode
-e,--query <statement> 以sql语句的结果导入,不能与table同时使用
Import
results of
SQL
'statement'
-m,--num-mappers <n> 指定使用几个map Use 'n' map
tasks to
import in
parallel
--mapreduce-job-name <name> 指定job名称 Set name for
generated
mapreduce
job
--table <table-name> 指定导入什么表 Table to
read
--target-dir <dir> 导入的位置 HDFS plain
table
destination
--where <where clause> 采用什么条件导入 WHERE clause
to use
during
import
--fields-terminated-by <char> 导入的数据之间用什么分隔 Sets the field separator character
sqoop help export 查看导出帮助
使用案例MySQL ==> HDFS:
sqoop import \
--connect jdbc:mysql://hadoop001:3306/ruozedata_d7 \
--username root --password 123456 \
--table tbls \
--target-dir /user/hadoop/test01 \
--mapreduce-job-name test01 \
--delete-target-dir \
--fields-terminated-by '$' \
-m 2
[hadoop@hadoop001 lib]$ hdfs dfs -ls /user/hadoop/test01
Found 3 items
-rw-r--r-- 1 hadoop supergroup 0 2019-07-23 20:30 /user/hadoop/test01/_SUCCESS
-rw-r--r-- 1 hadoop supergroup 373 2019-07-23 20:30 /user/hadoop/test01/part-m-00000
-rw-r--r-- 1 hadoop supergroup 692 2019-07-23 20:30 /user/hadoop/test01/part-m-00001
[hadoop@hadoop001 lib]$ hdfs dfs -text /user/hadoop/test01/part-m-00000
1$1563267428$1$0$hadoop$0$1$ruozedata$MANAGED_TABLE$null$null
8$1563333223$1$0$hadoop$0$8$emp$MANAGED_TABLE$null$null
11$1563417967$1$0$hadoop$0$11$emp_01$MANAGED_TABLE$null$null
14$1563420051$1$0$hadoop$0$14$emp_external$EXTERNAL_TABLE$null$null
16$1563439978$1$0$hadoop$0$16$dual$MANAGED_TABLE$null$null
21$1563682724$11$0$hadoop$0$21$rating_json$MANAGED_TABLE$null$null
使用案例MySQL ==> HIVE:
sqoop import \
--connect jdbc:mysql://localhost:3306/ruozedata \
--password root \
--username root \
--table product_info \
--hive-database d7_hive \
--hive-table product_info \
--hive-import \
--hive-overwrite \
--delete-target-dir \
--fields-terminated-by '\t' \
--split-by product_id \
-m 2