sqoop同步数据

最新推荐文章于 2024-03-07 12:28:20 发布

lehsyh

最新推荐文章于 2024-03-07 12:28:20 发布

阅读量832

点赞数

分类专栏：大数据 mysql 文章标签： hive hadoop 数据仓库

本文链接：https://blog.csdn.net/lehsyh/article/details/126593349

版权

mysql 同时被 2 个专栏收录

22 篇文章 0 订阅

订阅专栏

大数据

8 篇文章 0 订阅

订阅专栏

1、下载

http://archive.apache.org/dist/sqoop/1.4.7/

2、安装

tar zxvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz

cp mysql-connector-java-5.1.46.jar sqoop-1.4.7.bin__hadoop-2.6.0/lib/

3、配置环境
1) 配置依赖
sqoop需要hadoop2.x环境,所以在配置环境里加相应的配置(感觉配置HADOOP_COMMON_HOME和HADOOP_MAPRED_HOME不太有效)
conf]$ cp sqoop-env-template.sh sqoop-env.sh
vim sqoop-env.sh

# Set Hadoop-specific environment variables here.
export HADOOP_HOME=${HOME}/tools/hadoop-2.6.0-cdh5.16.2
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop

#Set path to where bin/hadoop is available
#export HADOOP_COMMON_HOME=

#Set path to where hadoop-*-core.jar is available
#export HADOOP_MAPRED_HOME=

#set the path to where bin/hbase is available
#export HBASE_HOME=

#Set the path to where bin/hive is available
#export HIVE_HOME=

#Set the path for where zookeper config dir is
#export ZOOCFGDIR=

4、常用命令
https://sqoop.apache.org/docs/1.4.7/SqoopUserGuide.html

[work@host01-client sqoop-1.4.7.bin__hadoop-2.6.0]$ bin/sqoop help
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/work/tools/temp/sqoop-1.4.7.bin__hadoop-2.6.0/bin/../../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
22/08/20 09:03:41 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
usage: sqoop COMMAND [ARGS]

Available commands:
  codegen            Generate code to interact with database records
  create-hive-table  Import a table definition into Hive
  eval               Evaluate a SQL statement and display the results
  export             Export an HDFS directory to a database table
  help               List available commands
  import             Import a table from a database to HDFS
  import-all-tables  Import tables from a database to HDFS
  import-mainframe   Import datasets from a mainframe server to HDFS
  job                Work with saved jobs
  list-databases     List available databases on a server
  list-tables        List available tables in a database
  merge              Merge results of incremental imports
  metastore          Run a standalone Sqoop metastore
  version            Display version information

See 'sqoop help COMMAND' for information on a specific command.

5、导出

1) 查看数据库列表
bin/sqoop list-databases \
--connect jdbc:mysql://host02:3306/miniapp_data?useUnicodeCharactor=true \
--username data \
--password 123456

2) 查看库表
bin/sqoop list-tables \
--connect jdbc:mysql://host02:3306/miniapp_data?useUnicodeCharactor=true \
--username data \
--password 123456

3) mysql数据导入到hdfs
export HADOOP_COMMON_HOME=${HOME}/tools/hadoop-2.6.0-cdh5.16.2
export HADOOP_MAPRED_HOME=${HOME}/tools/hadoop-2.6.0-cdh5.16.2

bin/sqoop import \
--mapreduce-job-name sqoop_mysql_2_hdfs \
--connect jdbc:mysql://host02:3306/miniapp_data?useUnicodeCharactor=true \
--username data \
--password 123456 \
--table monitor_alarm_record_log \
--columns id,alarm_key \
--where 'id < 200' \
--hadoop-mapred-home ${HOME}/tools/hadoop-2.6.0-cdh5.16.2 \
--hadoop-home ${HOME}/tools/hadoop-2.6.0-cdh5.16.2 \
--target-dir hdfs://hadoop01:8020/data/sqoop_data/monitor_alarm_record_log \
--delete-target-dir \
--fields-terminated-by '\t' \
--num-mappers 2 \
--split-by id \

4) 压缩导入
(1) 把lzo的jar包复制到sqoop_home的lib下: sqoop-1.4.7.bin__hadoop-2.6.0]$ cp hadoop-lzo-0.4.20.jar lib/
(2) 复制命令:
bin/sqoop import \
--mapreduce-job-name sqoop_mysql_2_hdfs \
--connect jdbc:mysql://host02:3306/miniapp_data?useUnicodeCharactor=true \
--username data \
--password 123456 \
--table monitor_alarm_record_log \
--columns id,alarm_key \
--where 'id < 200' \
--boundary-query 'SELECT MIN(`id`)+10, MAX(`id`)-10 FROM `monitor_alarm_record_log` WHERE ( id < 200 )' \
--target-dir hdfs://hadoop01:8020/data/sqoop_data/monitor_alarm_record_log \
--delete-target-dir \
--fields-terminated-by '\t' \
--num-mappers 2 \
--split-by id \
--compress \
--compression-codec lzop \
--null-non-string '\\N' \
--null-string '\\N'

5) # query:
bin/sqoop import \
--mapreduce-job-name sqoop_mysql_2_hdfs \
--connect jdbc:mysql://host02:3306/miniapp_data?useUnicodeCharactor=true \
--username data \
--password 123456 \
--query 'select id, alarm_key from monitor_alarm_record_log where id < 200 and $CONDITIONS ' \
--target-dir hdfs://hadoop01:8020/data/sqoop_data/monitor_alarm_record_log \
--delete-target-dir \
--fields-terminated-by '\t' \
--num-mappers 1 \
--split-by id \
--compress \
--compression-codec lzop \
--null-non-string '\\N' \
--null-string '\\N'

注意:
1) --hadoop-mapred-home ${HOME}/tools/hadoop-2.6.0-cdh5.16.2
--hadoop-home ${HOME}/tools/hadoop-2.6.0-cdh5.16.2
这两个参数需要进一步验证有效性
1) --append and --delete-target-dir can not be used together.
2) --autoreset-to-one-mapper and --split-by cannot be used together.

6、其它命令

sqoop-1.4.7.bin__hadoop-2.6.0]$ bin/sqoop import --help
usage: sqoop import [GENERIC-ARGS] [TOOL-ARGS]

Common arguments:
   --connect <jdbc-uri>                                       Specify JDBC connect string
   --connection-manager <class-name>                          Specify connection manager class name
   --connection-param-file <properties-file>                  Specify connection parameters file
   --driver <class-name>                                      Manually specify JDBC driver class to use
   --hadoop-home <hdir>                                       Override $HADOOP_MAPRED_HOME_ARG
   --hadoop-mapred-home <dir>                                 Override $HADOOP_MAPRED_HOME_ARG
   --help                                                     Print usage instructions
   --metadata-transaction-isolation-level <isolationlevel>    Defines the transaction isolation level for metadata queries. For more details check java.sql.Connection javadoc or the JDBC specificaiton
   --oracle-escaping-disabled <boolean>                       Disable the escaping mechanism of the Oracle/OraOop connection managers
  
   -P                                                         Read password from console
   --password <password>                                      Set authentication password
   --password-alias <password-alias>                          Credential provider  password alias
   --password-file <password-file>                            Set authentication password file path
   --relaxed-isolation                                        Use read-uncommitted isolation for imports
   --skip-dist-cache                                          Skip copying jars to distributed cache
   --temporary-rootdir <rootdir>                              Defines the temporary root directory for the import
   --throw-on-error                                           Rethrow a RuntimeException on error occurred during the job
   --username <username>                                      Set authentication username
   --verbose                                                  Print more information while working

Import control arguments:
   --append                                                   Imports data in append mode
   --as-avrodatafile                                          Imports data to Avro data files
   --as-parquetfile                                           Imports data to Parquet  files
   --as-sequencefile                                          Imports data to SequenceFiles
   --as-textfile                                              Imports data as plain text
                                                              (default)
   --autoreset-to-one-mapper                                  Reset the
                                                              number of
                                                              mappers to
                                                              one mapper
                                                              if no split
                                                              key
                                                              available
   --boundary-query <statement>                               Set boundary query for retrieving max and min value of the primary key
   --columns <col,col,col...>                                 Columns to import from table
   --compression-codec <codec>                                Compression codec to use for import
   --delete-target-dir                                        Imports data in delete mode
   --direct                                                   Use direct import fast path
   --direct-split-size <n>                                    Split the
                                                              input stream
                                                              every 'n'
                                                              bytes when
                                                              importing in
                                                              direct mode
-e,--query <statement>                                        Import results of SQL 'statement'
   --fetch-size <n>                                           Set number 'n' of rows to fetch
                                                              from the database when more rows are needed Set the maximum size for an inline LOB
-m,--num-mappers <n>                                          Use 'n' map tasks to import in parallel
   --mapreduce-job-name <name>                                Set name for generated mapreduce job Key column to use to join results
   --split-by <column-name>                                   Column of the table used to  split work units
   --split-limit <size>                                       Upper Limit of rows per split for split columns of Date/Time/Timestamp and integer types. For date or timestamp fields it is calculated in seconds. split-limit should be greater than 0
   --table <table-name>                                       Table to read HDFS plain table destination
   --validate                                                 Validate the copy using the configured validator
   --validation-failurehandler <validation-failurehandler>    Fully qualified class name for ValidationFailureHandler
   --validation-threshold <validation-threshold>              Fully qualified class name
                                                              for ValidationThreshold
   --validator <validator>                                    Fully qualified class name for the Validator
   --warehouse-dir <dir>                                      HDFS parent for table destination
   --where <where clause>                                     WHERE clause to use during import
-z,--compress                                                 Enable compression

Incremental import arguments:
   --check-column <column>        Source column to check for incremental  change
   --incremental <import-type>    Define an incremental import of type 'append' or 'lastmodified'
   --last-value <value>           Last imported value in the incremental check column

Output line formatting arguments:
   --enclosed-by <char>               Sets a required field enclosing character
   --escaped-by <char>                Sets the escape character
   --fields-terminated-by <char>      Sets the field separator character
   --lines-terminated-by <char>       Sets the end-of-line character
   --mysql-delimiters                 Uses MySQL's default delimiter set: fields: ,  lines: \n  escaped-by: \ optionally-enclosed-by: '
   --optionally-enclosed-by <char>    Sets a field enclosing character

Input parsing arguments:
   --input-enclosed-by <char>               Sets a required field encloser
   --input-escaped-by <char>                Sets the input escape character
   --input-fields-terminated-by <char>      Sets the input field separator
   --input-lines-terminated-by <char>       Sets the input end-of-line char
   --input-optionally-enclosed-by <char>    Sets a field enclosing character

Hive arguments:
   --create-hive-table                         Fail if the target hive
                                               table exists
   --external-table-dir <hdfs path>            Sets where the external
                                               table is in HDFS
   --hive-database <database-name>             Sets the database name to
                                               use when importing to hive
   --hive-delims-replacement <arg>             Replace Hive record \0x01
                                               and row delimiters (\n\r)
                                               from imported string fields
                                               with user-defined string
   --hive-drop-import-delims                   Drop Hive record \0x01 and
                                               row delimiters (\n\r) from
                                               imported string fields
   --hive-home <dir>                           Override $HIVE_HOME
   --hive-import                               Import tables into Hive
                                               (Uses Hive's default
                                               delimiters if none are
                                               set.)
   --hive-overwrite                            Overwrite existing data in
                                               the Hive table
   --hive-partition-key <partition-key>        Sets the partition key to
                                               use when importing to hive
   --hive-partition-value <partition-value>    Sets the partition value to
                                               use when importing to hive
   --hive-table <table-name>                   Sets the table name to use
                                               when importing to hive
   --map-column-hive <arg>                     Override mapping for
                                               specific column to hive
                                               types.

HBase arguments:
   --column-family <family>    Sets the target column family for the
                               import
   --hbase-bulkload            Enables HBase bulk loading
   --hbase-create-table        If specified, create missing HBase tables
   --hbase-row-key <col>       Specifies which input column to use as the row key
   --hbase-table <table>       Import to <table> in HBase

HCatalog arguments:
   --hcatalog-database <arg>                        HCatalog database name
   --hcatalog-home <hdir>                           Override $HCAT_HOME
   --hcatalog-partition-keys <partition-key>        Sets the partition keys to use when
                                                    importing to hive
   --hcatalog-partition-values <partition-value>    Sets the partition values to use when importing to hive
   --hcatalog-table <arg>                           HCatalog table name
   --hive-home <dir>                                Override $HIVE_HOME
   --hive-partition-key <partition-key>             Sets the partition key to use when importing to hive
   --hive-partition-value <partition-value>         Sets the partition value to use when importing to hive
   --map-column-hive <arg>                          Override mapping for specific column to hive types.

HCatalog import specific options:
   --create-hcatalog-table             Create HCatalog before import
   --drop-and-create-hcatalog-table    Drop and Create HCatalog before
                                       import
   --hcatalog-storage-stanza <arg>     HCatalog storage stanza for table
                                       creation

Accumulo arguments:
   --accumulo-batch-size <size>          Batch size in bytes
   --accumulo-column-family <family>     Sets the target column family for
                                         the import
   --accumulo-create-table               If specified, create missing
                                         Accumulo tables
   --accumulo-instance <instance>        Accumulo instance name.
   --accumulo-max-latency <latency>      Max write latency in milliseconds
   --accumulo-password <password>        Accumulo password.
   --accumulo-row-key <col>              Specifies which input column to use as the row key
   --accumulo-table <table>              Import to <table> in Accumulo
   --accumulo-user <user>                Accumulo user name.
   --accumulo-visibility <vis>           Visibility token to be applied to all rows imported
   --accumulo-zookeepers <zookeepers>    Comma-separated list of
                                         zookeepers (host:port)

Code generation arguments:
   --bindir <dir>                             Output directory for
                                              compiled objects
   --class-name <name>                        Sets the generated class name. This overrides --package-name. When combined with --jar-file, sets the input class.
   --escape-mapping-column-names <boolean>    Disable special characters
                                              escaping in column names
   --input-null-non-string <null-str>         Input null non-string representation
   --input-null-string <null-str>             Input null string representation
   --jar-file <file>                          Disable code generation; use specified jar
   --map-column-java <arg>                    Override mapping for specific columns to java types
   --null-non-string <null-str>               Null non-string representation
   --null-string <null-str>                   Null string representation
   --outdir <dir>                             Output directory for  generated code
   --package-name <name>                      Put auto-generated classes in this package

Generic Hadoop command-line arguments:
(must preceed any tool-specific arguments)
Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenode
-jt <local|resourcemanager:port>    specify a ResourceManager
-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]


At minimum, you must specify --connect and --table
Arguments to mysqldump and other subprograms may be supplied
after a '--' on the command line.

lehsyh

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
sqoop同步数据

(1) 把lzo的jar包复制到sqoop_home的lib下: sqoop-1.4.7.bin__hadoop-2.6.0]$ cp hadoop-lzo-0.4.20.jar lib/sqoop需要hadoop2.x环境,所以在配置环境里加相应的配置(感觉配置HADOOP_COMMON_HOME和HADOOP_MAPRED_HOME不太有效)3) mysql数据导入到hdfs。这两个参数需要进一步验证有效性。1) 查看数据库列表。...
复制链接

扫一扫

专栏目录