Sqoop导入导出数据命令参数详解

最新推荐文章于 2022-05-10 16:21:56 发布

卷曲的葡萄藤

最新推荐文章于 2022-05-10 16:21:56 发布

阅读量2.6k

点赞数

分类专栏： Sqoop

Sqoop 专栏收录该内容

0 篇文章 0 订阅

订阅专栏

注:转自 https://blog.csdn.net/wtzhm/article/details/81810159

sqoop 导入导出数据命令参数详解

1. 从关系数据库中导入到hdfs中

sqoop import \

--connect<jdbc-uri> 指的是连接地址，这里面是mysql服务器的地址；

--username<username> 数据库用户名

--password<password> 数据库密码

--target-dir<dir> 导入到哪一个HDFS目录

--table<table-name> 导入关系型数据库哪一张表的数据

--as-textfile# 导入为文本文件

--as-avrodatafile#导入为avro文件

--as-sequencefile#导入为序列化文件

--as-parquetfile#导入为parquet文件

--columns<col1,col2...>从hive表导入哪几列

--delete-target-dir如果目标目录存在则删除

-m设置多少个mapper并发

-e|--query执行查询的sql语句

--WHERE可以进一步对查询语句过滤

-z|--compress启用压缩

--compression-codec:指定压缩类型，默认gzip

--null-string<null-string>：如果表字段是字符串类型，如果为空，导入HDFS则写入<null-string>

--null-non-string<null-string>: 如果表字段是非字符类型，如果为空，导入HDFS则写为<null-string

注意：指定的hdfs的目录不能存在，因为sqoop会将这个目录作为MapReduce的输出目录。

eg: 导入表数据子集
sqoop import \
--connect jdbc:mysql://mysqlhost:3306/userdb \
--username root \
--password root \
--table usertable \
--where "name='tom' and age=15" \
--target-dir /user/test \
--m 1;

2. 从关系书库中导入到hive表中

sqoopimport \

–connectjdbc:mysql://hadoop-all-01:3306/hadoop \

–usernamehive \

–passwordhive \

–hive-import\

–create-hive-table\

–tableemployee \

–hive-overwrite\

–fields-terminated-by”,” \

–lines-terminated-by”\n” \

–hive-tablehadoop.employee

–hive-import： 表示导入导入到hive

–table：从关系型数据库哪一张表导入到hive,并且导入到的hdfs路径下不能包含这个名字，否则提示表已存在。原因在于他是先导入到hdfs的，在通过hive 的loaddata导入hive中去

–hive-overwrite：如果表已经存在，则覆盖数据，一般情况下，不会存在，如果有这个表，在导入的时候就会报错

–hive-table:导入到hive什么数据库的哪一张表，如果不指定数据库或者压根不指定这个参数，那么久会导入到default数据库下，默认表名和RMDBS的表名一样

3. 从HDFS导出数据到关系型数据库

语法：

sqoop export \

--connectjdbc  \

--username  \

--password  \

--driver  \

--columnscol1,col2,...... 导出到表哪几列

--direct：使用direct导出，速度更快

--export-dir<dir>： 从HDFS哪一个目录导出

-m<n>: 指定mapper个数

--table<table-name>：导出数据库哪一张表



解析输入参数：

--input-enclosed-by<char>： 设置输入字段的封闭字符

--input-escaped-by<char>：设置输入字段的转义字符

--input-fields-terminated-by<char>：HDFS文件字段分隔符

--input-lines-terminated-by<char>：HDFS文件换行符



输出格式化参数：

--enclosed-by<char>： 设置字段封闭符

--escaped-by<char>： 设置转义字符

--fields-terminated-by<char>： 设置每一字段分割符

--lines-terminated-by<char>: 设置每一行结束分割符

--mysql-delimiters：使用mysql默认的分隔符，fields:,  lines: \n  escaped-by: \

如果在hive表中该字段是字符串且为NULL,然后理解成为NULL而不是\N

--input-null-string<null-string>

如果在hive表中该字段是非字符串类型且为NULL,然后理解成为NULL而不是\N

--input-null-non-string<null-string>


**eg:**

sqoopexport \

--connectjdbc:mysql://hadoop-all-01:3306/hadoop \

--usernamehive \

--passwordhive \

--drivercom.mysql.jdbc.Driver \

--export-dir/user/hive/warehouse/hadoop.db/m_d \

--columns"id,country,city,phone" \

--tabledirector \

--num-mappers2 \

--direct\

--input-fields-terminated-by',' \

--input-null-string'\\N' \

--input-null-non-string'\\N'

4. Hive数据导入到RMDBS表

这个其实跟HDFS数据导入到RMDBS表一样，因为数据就是存储在HDFS上的

卷曲的葡萄藤

关注

0
点赞
踩
8

收藏

觉得还不错? 一键收藏
0
评论
Sqoop导入导出数据命令参数详解

注:转自https://blog.csdn.net/wtzhm/article/details/81810159sqoop 导入导出数据命令参数详解1. 从关系数据库中导入到hdfs中sqoop import \--connect<jdbc-uri> 指的是连接地址，这里面是mysql服务器的地址；--username<username> 数据...
复制链接

扫一扫