sqoop2提供了关系型数据库与hdfs(含hive,hbase)进行数据互相传输的功能。
**Available commands**:
**codegen** Generate code to interact with database records
**create-hive-table** Import a table definition into Hive
**eval** Evaluate a SQL statement and display the results
**export** Export an HDFS directory to a database table
**help** List available commands
**import** Import a table from a database to HDFS
**import-all-tables** Import tables from a database to HDFS
**import-mainframe** Import datasets from a mainframe server to HDFS
**job** Work with saved jobs
**list-databases** List available databases on a server
**list-tables** List available tables in a database
**merge** Merge results of incremental imports
**metastore** Run a standalone Sqoop metastore
**version** Display version information
具体用法如下:
1. list-databases (列出所有数据库)
sqoop list-databases --connect jdbc:mysql://ip:port/dbname --username \${DB_USER_NAME} --password \${DB_PASSWORD}
2. list-tables(列出所有表)
sqoop list-tables --connect jdbc:mysql://ip:port/dbname --username \${DB_USER_NAME} --password \${DB_PASSWORD}
3. eval
sqoop eval --connect jdbc:mysql://ip:port/dbname --username \${DB_USER_NAME} --password \${DB_PASSWORD} -e 'select col_1,col_2,col_3 from tblA limit 20'
4. import (mysql数据导入到hdfs)
sqoop import \
--connect "jdbc:mysql://${DB_IP}:${DB_PORT}/${DB_NAME}?tinyInt1isBit=false&zeroDateTimeBehavior=convertToNull&useUnicode=true&characterEncoding=utf8" \
--username ${DB_USER} \
--password '${DB_PSSS}' \
--query 'select COL_1,COL_2,COL_3 from t_pro_material_plan where DATE_FORMAT(create_time,"%Y-%m-%d") > DATE_SUB(CURDATE(),INTERVAL 1 YEAR) and $CONDITIONS ' \
--hive-drop-import-delims \
--null-string '' \
--null-non-string '' \
--fields-terminated-by "\001" \
--lines-terminated-by "\n" \
-target-dir hdfs://${HADOOP_NAMENODE_IP}:8020/user/hive/warehouse/ods.db/ods_${DB_NAME}_db_${DB_TABLE_NAME} \
--delete-target-dir \
-m 5
引申:
mysql导入Hbase
a) import mysql to hbase
sqoop import --connect jdbc:mysql://ip:port/dbname --username \${DB_USER_NAME} --password \${DB_PASSWORD} --hbase-table \${HBASE_TBL_NAME} --hbase-row-key \${FIELD_SELECT_AS_YOUR_ROWKEY} --column-family \${HBASE_CF_NAME} --hbase-create-table --table \${HBASE_TBL_NAME}
b) import mysql to hbase(并发度、分段读取、CAST对原数据类型进行转换、query子句、where子句)
sqoop import --connect jdbc:mysql://ip:port/dbname --username \${DB_USER_NAME} --password \${DB_PASSWORD} --hbase-table \${HBASE_TBL_NAME} --hbase-row-key \${FIELD_SELECT_AS_YOUR_ROWKEY} --column-family \${HBASE_CF_NAME} --hbase-create-table -m \${PARALLEL_NUMBER} --split-by id --query 'select `id`,`COL2`,CAST(`COL3` AS SIGNED) AS COL3 from \${RDBMS_TBL_NAME} where $CONDITIONS'
5. export(hive导出到mysql)
封装成一个shell脚本:
#!/bin/bash
TABLE_NAME=${HIVE_TABLE}
sqoop eval --connect "jdbc:mysql://${DB_IP}:${DB_PORT}/${DB_NAME}?&characterEncoding=utf-8" \
--username '${DB_USER}' \
--password '${DB_PASS}' \
--query "truncate table ${TABLE_NAME}"
sqoop export \
--connect "jdbc:mysql://${DB_IP}:${DB_PORT}/${DB_NAME}?useUnicode=true&characterEncoding=utf-8" \
--username '${DB_USER}' \
--password '${DB_PASS}' \
--table ${TABLE_NAME} -m 1 \
--input-null-string '\\N' \
--input-null-non-string '\\N' \
--export-dir 'hdfs://${HADOOP_NAMENODE_IP}:8020/user/hive/warehouse/ads.db/${HIVE_TABLE}' \
--update-mode 'allowinsert' \
--input-fields-terminated-by "\001"
备注:
—以上option选项前的连字符,实际需替换成双连字符。