Sqoop的安装,简介与简单使用

最新推荐文章于 2024-03-14 19:04:13 发布

*风轻云淡*

最新推荐文章于 2024-03-14 19:04:13 发布

阅读量362

点赞数 1

分类专栏： sqoop 大数据文章标签： Sqoop的安装简介与简单使用

本文链接：https://blog.csdn.net/qq_40468327/article/details/88687887

版权

大数据同时被 2 个专栏收录

18 篇文章 1 订阅

订阅专栏

sqoop

0 篇文章 0 订阅

订阅专栏

1 Sqoop的安装

首先,sqoop的下载地址:http://sqoop.apache.org/

点击download

点击这个链接即可下载

将sqoop下载到本地以后 , 传到 Linux 平台 , 然后解压到自己的指定目录 , 修改配置文件 , 配置sqoop到 /etc/profile 变量里面 , 在lib下面添加连接数据库的jar包

因为原来的evn是一个模板 , 所以我们要自己复制一个

修改 sqoop-env.sh

添加sqoop到环境变量中

添加jar包到lib目录下面

source /etc/profile 重新加载一下配置文件

然后 sqoop -version 即安装成功

2 sqoop的简介

sqoop 是 apache 旗下一款“Hadoop 和关系数据库服务器之间传送数据”的工具。

核心的功能有两个：

导入、迁入

导出、迁出

导入数据：MySQL，Oracle 导入数据到 Hadoop 的 HDFS、HIVE、HBASE 等数据存储系统

导出数据：从 Hadoop 的文件系统中导出数据到关系数据库 mysql 等 Sqoop 的本质还是一个命令行工具，和 HDFS，Hive 相比，并没有什么高深的理论。

sqoop工具：本质就是迁移数据，迁移的方式：就是把sqoop的迁移命令转换成MR程序

hive工具 : 本质就是执行计算，依赖于HDFS存储数据，把SQL转换成MR程序

sqoop的工作机制 : 将导入或导出命令翻译成 MapReduce 程序来实现在翻译出的 MapReduce 中主要是对 InputFormat 和 OutputFormat 进行定制

3 sqoop的使用

常用参数 :

--connect <jdbc-uri> jdbc 连接地址
--connection-manager <class-name> 连接管理者
--driver <class-name> 驱动类
--hadoop-mapred-home <dir> $HADOOP_MAPRED_HOME
--help help 信息
-P 从命令行输入密码
--password <password> 密码
--username <username> 账号
--verbose 打印流程信息
--connection-param-file <filename> 可选参数

第一类：数据库中的数据导入到HDFS上

sqoop import --connect jdbc:mysql://192.168.1.10:3306/bigdata30 --username root --password 123456  --table trade_detail --columns 'id, account, income, expenses'

指定输出路径、指定数据分隔符

sqoop import --connect jdbc:mysql://192.168.1.10:3306/bigdata30 --username root --password 123456  --table trade_detail --target-dir '/sqoop/td' --fields-terminated-by '\t'

指定Map数量 -m

sqoop import --connect jdbc:mysql://192.168.1.10:3306/bigdata30 --username root --password 123456  --table trade_detail --target-dir '/sqoop/td1' --fields-terminated-by '\t' -m 2

增加where条件, 注意：条件必须用引号引起来

sqoop import --connect jdbc:mysql://192.168.1.10:3306/bigdata30 --username root --password 123456  --table trade_detail --where 'id>3' --target-dir '/sqoop/td2'

增加query语句(使用 \ 将语句换行)


 
 
   
   
    
    
   
   
   
   
    
    
     
     sqoop 
     
     import --connect jdbc:mysql:
     
     //192.168.1.10:3306/bigdata30 --username root --password 123456 \
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
     --query '
     
     SELECT * 
     
     FROM trade_detail 
     
     where id > 
     
     2 
     
     AND $
     
     CONDITIONS' --
     
     split-by trade_detail.id --target-dir '/sqoop/td3'

注意：如果使用--query这个命令的时候，需要注意的是where后面的参数，AND $CONDITIONS这个参数必须加上而且存在单引号与双引号的区别，如果--query后面使用的是双引号，那么需要在$CONDITIONS前加上\即\$CONDITIONS 如果设置map数量为1个时即-m 1，不用加上--split-by ${tablename.column}，否则需要加上

第二类：将HDFS上的数据导出到数据库中(不要忘记指定分隔符)

sqoop export --connect jdbc:mysql://192.168.8.120:3306/bigdata30 --username root --password 123456 --export-dir '/td3' --table td_bak -m 1 --fields-terminated-by ','

输入sqoop help 即可查看sqoop支持哪些命令

输入sqoop help import 即可如果不知道使用方式，可以使用 sqoop command 的方式来查看某条具体命令的使用方式，比如：


 
 
   
   
    
    
   
   
   
   
    
    
     
     Common arguments:
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --connect <jdbc-uri>                                       Specify JDBC
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   connect
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   string
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --connection-manager <class-name>                          Specify
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   connection
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   manager
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   class name
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --connection-param-file <properties-file>                  Specify
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   connection
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   parameters
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   file
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --driver <class-name>                                      Manually
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   specify JDBC
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   driver class
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   to use
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --hadoop-home <hdir>                                       Override
    
    
   
   

   
   
    
    
   
   
   
   
    
                                                                  
     
     $HADOOP_MAPR
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   ED_HOME_ARG
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --hadoop-mapred-home <dir>                                 Override
    
    
   
   

   
   
    
    
   
   
   
   
    
                                                                  
     
     $HADOOP_MAPR
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   ED_HOME_ARG
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --
     
     help                                                     Print usage
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   instructions
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --metadata-transaction-isolation-level <isolationlevel>    Defines the
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   transaction
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   isolation
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   level 
     
     for
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   metadata
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   queries. For
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   more details
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   check
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   java.sql.Con
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   nection
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   javadoc or
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   the JDBC
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   specificaito
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   n
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --oracle-escaping-disabled <boolean>                       Disable the
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   escaping
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   mechanism of
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   the
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   Oracle/OraOo
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   p connection
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   managers
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
     -P                                                            Read
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   password
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   from console
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --password <password>                                      Set
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   authenticati
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   on password
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --password-alias <password-alias>                          Credential
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   provider
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   password
    
    
   
   

   
   
    
    
   
   
   
   
    
                                                                  
     
     alias
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --password-file <password-file>                            Set
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   authenticati
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   on password
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   file path
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --relaxed-isolation                                        Use
    
    
   
   

   
   
    
    
   
   
   
   
    
                                                                  
     
     read-uncommi
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   tted
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   isolation
    
    
   
   

   
   
    
    
   
   
   
   
    
                                                                  
     
     for imports
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --skip-dist-cache                                          Skip copying
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   jars to
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   distributed
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   cache
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --temporary-rootdir <rootdir>                              Defines the
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   temporary
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   root
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   directory
    
    
   
   

   
   
    
    
   
   
   
   
    
                                                                  
     
     for the
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   import
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --throw-on-error                                           Rethrow a
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   RuntimeExcep
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   tion on
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   error
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   occurred
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   during the
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   job
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --username <username>                                      Set
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   authenticati
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   on username
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --verbose                                                  Print more
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   information
    
    
   
   

   
   
    
    
   
   
   
   
    
                                                                  
     
     while
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   working
    
    
   
   

   
   
    
    
   
   
   
   
    
     
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
     Import control arguments:
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --append                                                   Imports data
    
    
   
   

   
   
    
    
   
   
   
   
    
                                                                  
     
     in append
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   mode
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --as-avrodatafile                                          Imports data
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   to Avro data
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   files
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --as-parquetfile                                           Imports data
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   to Parquet
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   files
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --as-sequencefile                                          Imports data
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   to
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   SequenceFile
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   s
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --as-textfile                                              Imports data
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   as plain
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   text
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   (default)
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --autoreset-to-one-mapper                                  Reset the
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   number of
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   mappers to
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   one mapper
    
    
   
   

   
   
    
    
   
   
   
   
    
                                                                  
     
     if no split
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   key
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   available
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --boundary-query <statement>                               Set boundary
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   query 
     
     for
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   retrieving
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   max and min
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   value of the
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   primary key
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --columns <col,col,col...>                                 Columns to
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   import from
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   table
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --compression-codec <codec>                                Compression
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   codec to use
    
    
   
   

   
   
    
    
   
   
   
   
    
                                                                  
     
     for import
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --delete-target-dir                                        Imports data
    
    
   
   

   
   
    
    
   
   
   
   
    
                                                                  
     
     in delete
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   mode
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --direct                                                   Use direct
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   import fast
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   path
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --direct-split-size <n>                                    Split the
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   input stream
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   every 
     
     'n'
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   bytes when
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   importing 
     
     in
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   direct mode
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
     -e,--query <statement>                                        Import
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   results of
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   SQL
    
    
   
   

   
   
    
    
   
   
   
   
    
                                                                  
     
     'statement'
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --fetch-size <n>                                           Set number
    
    
   
   

   
   
    
    
   
   
   
   
    
                                                                  
     
     'n' of rows
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   to fetch
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   from the
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   database
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   when more
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   rows are
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   needed
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --inline-lob-limit <n>                                     Set the
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   maximum size
    
    
   
   

   
   
    
    
   
   
   
   
    
                                                                  
     
     for an
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   inline LOB
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
     -m,--num-mappers <n>                                          Use 
     
     'n' map
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   tasks to
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   import 
     
     in
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   parallel
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --mapreduce-job-name <name>                                Set name 
     
     for
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   generated
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   mapreduce
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   job
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --merge-key <column>                                       Key column
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   to use to
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   join results
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --split-by <column-name>                                   Column of
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   the table
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   used to
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   split work
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   units
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --split-limit <size>                                       Upper Limit
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   of rows per
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   split 
     
     for
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   split
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   columns of
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   Date/Time/Ti
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   mestamp and
    
    
   
   

   
   
    
    
   
   
   
   
    
                                                                  
     
     integer
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   types. For
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   date or
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   timestamp
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   fields it is
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   calculated
    
    
   
   

   
   
    
    
   
   
   
   
    
                                                                  
     
     in seconds.
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   split-limit
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   should be
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   greater than
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   0
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --table <table-name>                                       Table to
    
    
   
   

   
   
    
    
   
   
   
   
    
                                                                  
     
     read
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --target-dir <dir>                                         HDFS plain
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   table
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   destination
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --validate                                                 Validate the
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   copy using
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   the
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   configured
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   validator
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --validation-failurehandler <validation-failurehandler>    Fully
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   qualified
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   class name
    
    
   
   

   
   
    
    
   
   
   
   
    
                                                                  
     
     for
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   ValidationFa
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   ilureHandler
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --validation-threshold <validation-threshold>              Fully
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   qualified
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   class name
    
    
   
   

   
   
    
    
   
   
   
   
    
                                                                  
     
     for
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   ValidationTh
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   reshold
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --validator <validator>                                    Fully
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   qualified
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   class name
    
    
   
   

   
   
    
    
   
   
   
   
    
                                                                  
     
     for the
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   Validator
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --warehouse-dir <dir>                                      HDFS parent
    
    
   
   

   
   
    
    
   
   
   
   
    
                                                                  
     
     for table
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   destination
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --
     
     where <
     
     where clause>                                     WHERE clause
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   to use
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   during
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   import
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
     -z,--compress                                                 Enable
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                   compression
    
    
   
   

   
   
    
    
   
   
   
   
    
     
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
     Incremental import arguments:
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --check-column <column>        Source column to check 
     
     for incremental
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                       change
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --incremental <import-type>    Define an incremental import of 
     
     type
    
    
   
   

   
   
    
    
   
   
   
   
    
                                      
     
     'append' or 
     
     'lastmodified'
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --last-value <value>           Last imported value 
     
     in the incremental
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                       check column
    
    
   
   

   
   
    
    
   
   
   
   
    
     
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
     Output line formatting arguments:
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --enclosed-by <char>               Sets a required field enclosing
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                           character
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --escaped-by <char>                Sets the escape character
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --fields-terminated-by <char>      Sets the field separator character
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --lines-terminated-by <char>       Sets the end-of-line character
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --mysql-delimiters                 Uses MySQL
     
     's default delimiter set:
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                           fields: ,  lines: \n  escaped-by: \
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                           optionally-enclosed-by: '
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --optionally-enclosed-by <char>    Sets a field enclosing character
    
    
   
   

   
   
    
    
   
   
   
   
    
     
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
     Input parsing arguments:
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --input-enclosed-by <char>               Sets a required field encloser
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --input-escaped-by <char>                Sets the input escape
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                 character
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --input-fields-terminated-by <char>      Sets the input field separator
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --input-lines-terminated-by <char>       Sets the input end-of-line
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                 char
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --input-optionally-enclosed-by <char>    Sets a field enclosing
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                 character
    
    
   
   

   
   
    
    
   
   
   
   
    
     
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
     Hive arguments:
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --create-hive-table                         Fail 
     
     if the target hive
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                    table exists
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --external-table-dir <hdfs path>            Sets 
     
     where the external
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                    table is 
     
     in HDFS
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --hive-database <database-name>             Sets the database name to
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                    use when importing to hive
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --hive-delims-replacement <arg>             Replace Hive record \0x01
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                    and row delimiters (\n\r)
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                    from imported string fields
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                    with user-defined string
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --hive-drop-import-delims                   Drop Hive record \0x01 and
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                    row delimiters (\n\r) from
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                    imported string fields
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --hive-home <dir>                           Override 
     
     $HIVE_HOME
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --hive-import                               Import tables into Hive
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                    (Uses Hive
     
     's default
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                    delimiters if none are
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                    set.)
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --hive-overwrite                            Overwrite existing data in
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                    the Hive table
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --hive-partition-key <partition-key>        Sets the partition key to
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                    use when importing to hive
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --hive-partition-value <partition-value>    Sets the partition value to
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                    use when importing to hive
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --hive-table <table-name>                   Sets the table name to use
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                    when importing to hive
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --map-column-hive <arg>                     Override mapping for
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                    specific column to hive
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                    types.
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
     
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
     HBase arguments:
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --column-family <family>    Sets the target column family for the
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                    import
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --hbase-bulkload            Enables HBase bulk loading
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --hbase-create-table        If specified, create missing HBase tables
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --hbase-row-key <col>       Specifies which input column to use as the
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                    row key
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --hbase-table <table>       Import to <table> in HBase
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
     
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
     HCatalog arguments:
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --hcatalog-database <arg>                        HCatalog database name
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --hcatalog-home <hdir>                           Override $HCAT_HOME
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --hcatalog-partition-keys <partition-key>        Sets the partition
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                         keys to use when
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                         importing to hive
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --hcatalog-partition-values <partition-value>    Sets the partition
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                         values to use when
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                         importing to hive
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --hcatalog-table <arg>                           HCatalog table name
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --hive-home <dir>                                Override $HIVE_HOME
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --hive-partition-key <partition-key>             Sets the partition key
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                         to use when importing
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                         to hive
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --hive-partition-value <partition-value>         Sets the partition
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                         value to use when
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                         importing to hive
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --map-column-hive <arg>                          Override mapping for
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                         specific column to
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                         hive types.
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
     
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
     HCatalog import specific options:
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --create-hcatalog-table             Create HCatalog before import
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --drop-and-create-hcatalog-table    Drop and Create HCatalog before
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                            import
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --hcatalog-storage-stanza <arg>     HCatalog storage stanza for table
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                            creation
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
     
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
     Accumulo arguments:
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --accumulo-batch-size <size>          Batch size in bytes
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --accumulo-column-family <family>     Sets the target column family for
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                              the import
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --accumulo-create-table               If specified, create missing
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                              Accumulo tables
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --accumulo-instance <instance>        Accumulo instance name.
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --accumulo-max-latency <latency>      Max write latency in milliseconds
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --accumulo-password <password>        Accumulo password.
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --accumulo-row-key <col>              Specifies which input column to
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                              use as the row key
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --accumulo-table <table>              Import to <table> in Accumulo
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --accumulo-user <user>                Accumulo user name.
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --accumulo-visibility <vis>           Visibility token to be applied to
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                              all rows imported
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --accumulo-zookeepers <zookeepers>    Comma-separated list of
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                              zookeepers (host:port)
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
     
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
     Code generation arguments:
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --bindir <dir>                             Output directory for
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                   compiled objects
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --class-name <name>                        Sets the generated class
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                   name. This overrides
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                   --package-name. When
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                   combined with --jar-file,
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                   sets the input class.
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --escape-mapping-column-names <boolean>    Disable special characters
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                   escaping in column names
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --input-null-non-string <null-str>         Input null non-string
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                   representation
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --input-null-string <null-str>             Input null string
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                   representation
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --jar-file <file>                          Disable code generation; use
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                   specified jar
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --map-column-java <arg>                    Override mapping for
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                   specific columns to java
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                   types
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --null-non-string <null-str>               Null non-string
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                   representation
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --null-string <null-str>                   Null string representation
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --outdir <dir>                             Output directory for
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                   generated code
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
        --package-name <name>                      Put auto-generated classes
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                   in this package
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
     
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
     Generic Hadoop command-line arguments:
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
     (must preceed any tool-specific arguments)
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
     Generic options supported are
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
     -conf <configuration file>     specify an application configuration file
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
     -D <property=value>            use value for given property
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
     -fs <file:///|hdfs://namenode:port> specify default filesystem URL to use, overrides 'fs.defaultFS
     
     ' property from configurations.
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
     -jt <local|resourcemanager:port>    specify a ResourceManager
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
     -files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
     -libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
     -archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
     
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
     The general command line syntax is
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
     command [genericOptions] [commandOptions]
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
     
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
     
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
     At minimum, you must specify --connect and --table
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
     Arguments to mysqldump and other subprograms may be supplied
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
     after a '--
     
     ' on the command line.