-
RDBMS导入HDFS
sqoop 默认使用4个map,可以自定义map的数量;导入数据可以指定列;可以使用query参数,导入需要的数据;可以指定字段的分割符。
执行脚本文件导入数据:
bin/sqoop --options-file /opt/datas/sqoop_imp.sh
sqoop_imp.sh文件内容:
import
–connect
jdbc:mysql://master:3306/demo
–username
root
–password
123456
–table
my_user
–target-dir
/user/sqoop/my_user_imp
–delete-target-dir
–num-mappers
1
–as-parquetfile
–compress
–compression-codec
org.apache.hadoop.io.compress.SnappyCodec -
HDFS导出到RDBMS
执行脚本文件导入数据:
bin/sqoop --options-file /opt/datas/sqoop_exp.sh
sqoop_exp.sh文件内容:
export
–connect
jdbc:mysql://master:3306/demo
–username
root
–password
123456
–table
my_user_exp
–export-dir
/user/sqoop/my_user_imp
–num-mappers
1
–direct -
RDBMS导入到hive表
sqoop导入到hive表是先导入数据到hdfs,在使用load data导入数据到hive表
执行脚本文件导入数据:
bin/sqoop --options-file /opt/datas/sqoop_imp_hive.sh
sqoop_imp_hive.sh脚本内容:
import
–connect
jdbc:mysql://master:3306/demo
–username
root
–password
123456
–table
my_user
–fields-terminated-by
‘\t’
–direct
–num-mappers
1
–hive-import
–hive-database
default
–hive-table
my_user -
hive表导出到RDBMS
执行脚本文件导入数据:
bin/sqoop --options-file /opt/datas/sqoop_exp_hive.sh
sqoop_exp_hive.sh脚本内容:
export
–connect
jdbc:mysql://master:3306/demo
–username
root
–password
123456
–table
my_user_hive_exp
–input-fields-terminated-by
‘\t’
–num-mappers
1
–export-dir
/user/hive/warehouse/my_user
Hive导出数据到RDBMS时,字段分隔符如果是\t的话,需要使用\001代替\t。 -
增量导入
增量导入,原目录不能删除
Incremental import arguments:
–check-column Source column to check for incremental
change
–incremental Define an incremental import of type
‘append’ or ‘lastmodified’
–last-value Last imported value in the incremental
check column
检查列,将某一列作为一个增量导入的依据,一般是主键
实现导入方式:‘append’ or ‘lastmodified’
根据最后插入的值来判断,–last-value指的是上一次最后插入的值,或者是最后导入的那条记录的时间
执行命令:
bin/sqoop import
–connect jdbc:mysql://master:3306/sqoop
–username root
–password 123456
–direct
–table tohdfs
–target-dir /sqoop
–num-mappers 1
–fields-terminated-by ‘\t’
–check-column id
–incremental append
–last-value 5 -
sqoop job使用
可以将sqoop打包成一个job,定时实现增量导入。
执行命令:
bin/sqoop job --help
创建sqoop job:
bin/sqoop job
–create stu_info
–
import
–connect jdbc:mysql://master:3306/sqoop
–username root
–password 123456
–direct
–table tohdfs
–target-dir /sqoop
–num-mappers 1
–fields-terminated-by ‘\t’
–check-column id
–incremental append
–last-value 8
–last-value ‘2017-05-14 15:00:30’
$ bin/sqoop job --show stu_info