sqoop的基本用法介绍

最新推荐文章于 2024-08-05 10:00:38 发布

jsjw18

最新推荐文章于 2024-08-05 10:00:38 发布

阅读量2.9k

点赞数

分类专栏： sqoop 文章标签： hive hadoop sqoop

本文链接：https://blog.csdn.net/victor_ww/article/details/41084255

版权

sqoop 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

注意：下面的用法都以mysql为例

RDBMS数据导入到hive

sqoop import --connect jdbc:mysql://172.17.210.180/dc_scheduler_client --username dc_scheduler_cli --password dc_scheduler_cli --table t_class --split-by id -m 2 --verbose --hive-import --create-hive-table --hive-table dc_test.t_class1 --<span style="font-family: Arial, Helvetica, sans-serif;">fields-terminated-by</span> '\t' --bindir /root/tmp --outdir /root/tmp <span style="font-family: Arial, Helvetica, sans-serif;">--null-string '\\N' --null-non-string '\\N'</span>
import：导入
connect：jdbc串
username：mysql的用户名
password：mysql的密码
table：mysql中的源表
split-by：按字段分割map，结合参数m进行使用
verbose：打印详细日志
hive-import：导入数据至hive
hive-create-table：根据原表导入hive表结构，当表已存在时会报错
fileds-terminated-by：hive中的表数据字段分隔符
bindir：存放sqoop产生的java代码对于的class文件及jar包
outdir：存放sqoop生产的java代码
null-string：源表数据字段为字符且为空时，用指定字符代替
null-non-string：源表数据字段不为字符且为空时，用指定字符代替

hive中的数据导出至RDBMS

sqoop export -export-dir /hive/warehouse/dc_test.db/t_class1  --connect jdbc:mysql://172.17.210.180/dc_scheduler_client --username dc_scheduler_cli --password dc_scheduler_cli --update-key id --update-mode allowinsert --table t_class --input-fields-terminated-by '\t' -m 1 --bindir /root/tmp --outdir /root/tmp --input-null-string '\\N' --input-null-non-string '\\N'

1. export：hive中导出数据
2. update-key：更新时依据的字段
3. update-mode：更新模式(updateonly：只更新 allowinsert：没有更新的情况，将数据插入)
4. table：目的端的表
5. input-fields-terminated-by：hive中的数据字段分隔符
6. input-null-string：当输出的字段为字符串并且为空时，用指定的字符替换
7. input-null-non-string：当输出的字段不是字符串且为空时，用指定的字符替换