sqoop的入门使用

一、简介

Sqoop
场景:
1)数据数据在RDBMS中,你想使用Hive进行处理
2)使用Hive统计分析好了,数据还在Hive中,如何导到RDBMS中的
统计结果最终是通过报表可视化展示的
HiveServer2
Hive统计结果导出到RDBMS

解决方案:
MapReduce

===> 抽象成常用的工具
RDBMS:url、driver、db、table、user、password
HDFS:path
Hive:databse、table、partition

SQL to Hadoop
RDBMS <==> Hadoop

两大版本:
1.4.* Sqoop1 ***
1.99.* Sqoop2
职责
数据从RDBMS和Hadoop之间进行导入导出操作
底层就是使用MapReduce来实现的
Map 只有Map
Reduce 没有Reduce

基于Hadoop作为参考点/基准点
导入: import
RDBMS ==> Hadoop

导出: export
Hadoop ==> RDBMS

二、部署

1.下载

[pxj@pxj31 /home/pxj/app]$sudo wget http://archive.cloudera.com/cdh5/cdh/5/sqoop-1.4.6-cdh5.16.2.tar.gz


2.解压

[pxj@pxj31 /home/pxj/app]$tar -zxvf sqoop-1.4.6-cdh5.16.2.tar.gz

3.配置环境变量

export SQOOP_HOME=/home/pxj/app/sqoop-1.4.6-cdh5.16.2
export PATH=$SQOOP_HOME/bin:$PATH

4.配置文件

[pxj@pxj31 /home/pxj/app/sqoop-1.4.6-cdh5.16.2/conf]$cp sqoop-env-template.sh  sqoop-env.sh
​
export HADOOP_COMMON_HOME=/home/pxj/app/hadoop/
export HADOOP_MAPRED_HOME=/home/pxj/app/hadoop/
export HIVE_HOME=/home/pxj/app/hive-1.1.0-cdh5.16.2
export HADOOP_HOME=/home/pxj/app/hadoop
export HADOOP_INSTALL=/home/pxj/app/hadoop
export HADOOP_CONF_DIR=$HADOOP_INSTALL/etc/hadoop
export PATH=${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:$PATH#hive
export HIVE_HOME=/home/pxj/app/hive-1.1.0-cdh5.16.2
export HIVE_CONF_DIR=$HIVE_HOME/conf
export PATH=${HIVE_HOME}/bin:$PATH#sqoop
export SQOOP_HOME=/home/pxj/app/sqoop-1.4.6-cdh5.16.2
export PATH=$SQOOP_HOME/bin:$PATH



5.驱动

[pxj@pxj31 /home/pxj/app/hive-1.1.0-cdh5.16.2/lib]$cp mysql-connector-java-5.1.27-bin.jar  /home/pxj/app/sqoop-1.4.6-cdh5.16.2/lib
cp $HIVE_HOME/lib/hive-common-1.1.0-cdh5.*.jar $SQOOP_HOME/lib/ 
cp $HIVE_HOME/lib/hive-shims*.jar  $SQOOP_HOME/lib/
​

三、简单应用

1.查看所有的库

[pxj@pxj31 /home/pxj/app/sqoop-1.4.6-cdh5.16.2/conf]$sqoop list-databases \
> --connect jdbc:mysql://pxj31:3306 \
> --password lenovo \
> --username root
Warning: /home/pxj/app/sqoop-1.4.6-cdh5.16.2/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/pxj/app/sqoop-1.4.6-cdh5.16.2/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/pxj/app/sqoop-1.4.6-cdh5.16.2/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/pxj/app/sqoop-1.4.6-cdh5.16.2/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
19/12/22 17:03:34 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.16.2
19/12/22 17:03:34 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/12/22 17:03:34 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
information_schema
mysql
performance_schema
pxj31_hive
pxjdata
sys


2.查看所有表

[pxj@pxj31 /home/pxj]$sqoop list-tables \
> --connect jdbc:mysql://pxj31:3306/pxjdata \
> --password lenovo \
> --username root
Warning: /home/pxj/app/sqoop-1.4.6-cdh5.16.2/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/pxj/app/sqoop-1.4.6-cdh5.16.2/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/pxj/app/sqoop-1.4.6-cdh5.16.2/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/pxj/app/sqoop-1.4.6-cdh5.16.2/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
19/12/22 17:34:33 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.16.2
19/12/22 17:34:33 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/12/22 17:34:33 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
a
a1
b
dept
e
emp
rzdata
salgrade
​
​

四、表的数据导入到HDFS

sqoop import原理:

从传统数据库获取元数据信息(schema、table、field、field type),把导入功能转换为只有Map的Mapreduce作业,在mapreduce中有很多map,每个map读一片数据,进而并行的完成数据的拷贝。

sqoop import
sqoop import \
--connect jdbc:mysql://pxj31:3306/pxjdata \
--password lenovo \
--username root \
--delete-target-dir \
--table salgrade \
--split-by 'GRADE'
​
sqoop import \
--connect jdbc:mysql://pxj31:3306/pxjdata \
--password lenovo --username root \
--table emp  \
--delete-target-dir --mapreduce-job-name FromMySQL2HDFS \
--columns "EMPNO,ENAME,JOB,SAL,COMM" \
--target-dir EMP_COLUMN_QUERY \
--fields-terminated-by '\t' \
--null-string '' \
--null-non-string '0' \
-m 1

–direct?能够提速呢

答:
For performance, each writer will commit the current transaction approximately every 32 MB of exported data. You can control this by specifying the following argument before any tool-specific arguments: -D sqoop.mysql.export.checkpoint.bytes=size, where size is a value in bytes. Set size to 0 to disable intermediate checkpoints, but individual files being exported will continue to be committed independently of one another.

Sometimes you need to export large data with Sqoop to a live MySQL cluster that is under a high load serving random queries from the users of your application. While data consistency issues during the export can be easily solved with a staging table, there is still a problem with the performance impact caused by the heavy export.

First off, the resources of MySQL dedicated to the import process can affect the performance of the live product, both on the master and on the slaves. Second, even if the servers can handle the import with no significant performance impact (mysqlimport should be relatively “cheap”), importing big tables can cause serious replication lag in the cluster risking data inconsistency.

With -D sqoop.mysql.export.sleep.ms=time, where time is a value in milliseconds, you can let the server relax between checkpoints and the replicas catch up by pausing the export process after transferring the number of bytes specified in sqoop.mysql.export.checkpoint.bytes. Experiment with different settings of these two parameters to archieve an export pace that doesn’t endanger the stability of your MySQL cluster.

–direct:直连模式,使用mysqldump加快速度。direct方式仅仅支持所有的列 有column的话direct就无效了

mysqldump复制到从节点:scp /usr/bin/mysqldump cm-slave1:/usr/bin/
使用direct会使得–null-non-string –null-string无效

某列的增量

sqoop import \
--connect jdbc:mysql://pxj31:3306/pxjdata \
--password lenovo --username root \
--table emp  \
--mapreduce-job-name FromMySQL2HDFS \
--target-dir EMP_APPEND \
--fields-terminated-by '\t' \
--null-string '' \
--incremental append \
--check-column EMPNO \
--last-value 7788 \  大于7788导入
--null-non-string '0' \
-m 1
[pxj@pxj31 /home/pxj/app/sqoop-1.4.6-cdh5.16.2]$hadoop fs  -text EMP_APPEND/*
19/12/22 22:34:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
7839    KING    PRESIDENT   0   1981-11-17 00:00:00.0   5000.00 0   10
7844    TURNER  SALESMAN    7698    1981-09-08 00:00:00.0   1500.00 0.00    30
7876    ADAMS   CLERK   7788    1983-01-12 00:00:00.0   1100.00 0   20
7900    JAMES   CLERK   7698    1981-12-03 00:00:00.0   950.00  0   30
7902    FORD    ANALYST 7566    1981-12-03 00:00:00.0   3000.00 0   20
7934    MILLER  CLERK   7782    1982-01-23 00:00:00.0   1300.00 0   10
​

eval

作用:Evaluate a SQL statement and display the results,也就是说eval像是一个数据库的客户端工具。

## 1.查询数据

[pxj@pxj31 /home/pxj/app/sqoop-1.4.6-cdh5.16.2]$sqoop eval --connect jdbc:mysql://pxj31:3306/pxjdata --username root --password lenovo --query "select * from emp"
Warning: /home/pxj/app/sqoop-1.4.6-cdh5.16.2/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/pxj/app/sqoop-1.4.6-cdh5.16.2/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/pxj/app/sqoop-1.4.6-cdh5.16.2/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/pxj/app/sqoop-1.4.6-cdh5.16.2/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
19/12/22 22:44:19 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.16.2
19/12/22 22:44:19 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/12/22 22:44:19 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
----------------------------------------------------------------------------------------------
| empno | ename      | job       | mgr   | hiredate            | sal       | comm      | deptno | 
----------------------------------------------------------------------------------------------
| 7499  | ALLEN      | SALESMAN  | 7698  | 1981-02-20 00:00:00.0 | 1600.00   | 300.00    | 30  | 
| 7521  | WARD       | SALESMAN  | 7698  | 1981-02-22 00:00:00.0 | 1250.00   | 500.00    | 30  | 
| 7566  | JONES      | MANAGER   | 7839  | 1981-04-02 00:00:00.0 | 2975.00   | (null)    | 20  | 
| 7654  | MARTIN     | SALESMAN  | 7698  | 1981-09-28 00:00:00.0 | 1250.00   | 1400.00   | 30  | 
| 7698  | BLAKE      | MANAGER   | 7839  | 1981-05-01 00:00:00.0 | 2850.00   | (null)    | 30  | 
| 7782  | CLARK      | MANAGER   | 7839  | 1981-06-09 00:00:00.0 | 2450.00   | (null)    | 10  | 
| 7788  | SCOTT      | ANALYST   | 7566  | 1982-12-09 00:00:00.0 | 3000.00   | (null)    | 20  | 
| 7839  | KING       | PRESIDENT | (null) | 1981-11-17 00:00:00.0 | 5000.00   | (null)    | 10  | 
| 7844  | TURNER     | SALESMAN  | 7698  | 1981-09-08 00:00:00.0 | 1500.00   | 0.00      | 30  | 
| 7876  | ADAMS      | CLERK     | 7788  | 1983-01-12 00:00:00.0 | 1100.00   | (null)    | 20  | 
| 7900  | JAMES      | CLERK     | 7698  | 1981-12-03 00:00:00.0 | 950.00    | (null)    | 30  | 
| 7902  | FORD       | ANALYST   | 7566  | 1981-12-03 00:00:00.0 | 3000.00   | (null)    | 20  | 
| 7934  | MILLER     | CLERK     | 7782  | 1982-01-23 00:00:00.0 | 1300.00   | (null)    | 10  | 
| 7369  | SMITH      | CLERK     | 7902  | 1980-12-17 00:00:00.0 | 800.00    | (null)    | 20  | 
----------------------------------------------------------------------------------------------

2.插入数据

[pxj@pxj31 /home/pxj/app/sqoop-1.4.6-cdh5.16.2]$sqoop eval \
> --connect jdbc:mysql://pxj31:3306/pxjdata \
> --password lenovo --username root \
> --query "insert into dept values (25,'RD', 'BEIJING')"
Warning: /home/pxj/app/sqoop-1.4.6-cdh5.16.2/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/pxj/app/sqoop-1.4.6-cdh5.16.2/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/pxj/app/sqoop-1.4.6-cdh5.16.2/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/pxj/app/sqoop-1.4.6-cdh5.16.2/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
19/12/22 22:47:44 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.16.2
19/12/22 22:47:44 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/12/22 22:47:45 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
19/12/22 22:47:45 INFO tool.EvalSqlTool: 1 row(s) updated.
​

[pxj@pxj31 /home/pxj/app/sqoop-1.4.6-cdh5.16.2]$sqoop import \
> --connect jdbc:mysql://pxj31:3306/pxjdata \
> --password lenovo \
> --username root \
> --target-dir EMP_OPTIONS_FILE \
> --delete-target-dir \
> --table emp \
> -m 2
Warning: /home/pxj/app/sqoop-1.4.6-cdh5.16.2/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/pxj/app/sqoop-1.4.6-cdh5.16.2/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/pxj/app/sqoop-1.4.6-cdh5.16.2/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/pxj/app/sqoop-1.4.6-cdh5.16.2/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
19/12/22 22:49:56 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.16.2
19/12/22 22:49:56 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/12/22 22:49:57 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
19/12/22 22:49:57 INFO tool.CodeGenTool: Beginning code generation
19/12/22 22:49:57 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1
19/12/22 22:49:57 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1
19/12/22 22:49:57 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/pxj/app/hadoop
注: /tmp/sqoop-pxj/compile/f6c8c98a7395bb019ffccd77130cd406/emp.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
19/12/22 22:50:00 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-pxj/compile/f6c8c98a7395bb019ffccd77130cd406/emp.jar
19/12/22 22:50:00 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/12/22 22:50:01 INFO tool.ImportTool: Destination directory EMP_OPTIONS_FILE is not present, hence not deleting.
19/12/22 22:50:01 WARN manager.MySQLManager: It looks like you are importing from mysql.
19/12/22 22:50:01 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
19/12/22 22:50:01 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
19/12/22 22:50:01 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
19/12/22 22:50:01 ERROR tool.ImportTool: Import failed: No primary key could be found for table emp. Please specify one with --split-by or perform a sequential import with '-m 1'.

五、sqoop的option

import
--connect
jdbc:mysql://pxj31:3306/pxjdata
--password
lenovo
--username
root
--target-dir
EMP_OPTIONS_FILE2
--delete-target-dir
--table
emp
-m
2
[pxj@pxj31 /home/pxj/app/sqoop-1.4.6-cdh5.16.2/datass]$sqoop --options-file emp.opt 
[pxj@pxj31 /home/pxj/app/sqoop-1.4.6-cdh5.16.2/datass]$sqoop -option-file emp.opt 
Warning: /home/pxj/app/sqoop-1.4.6-cdh5.16.2/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/pxj/app/sqoop-1.4.6-cdh5.16.2/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/pxj/app/sqoop-1.4.6-cdh5.16.2/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/pxj/app/sqoop-1.4.6-cdh5.16.2/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
No such sqoop tool: -option-file. See 'sqoop help'.
[pxj@pxj31 /home/pxj/app/sqoop-1.4.6-cdh5.16.2/datass]$sqoop --options-file emp.opt 
Warning: /home/pxj/app/sqoop-1.4.6-cdh5.16.2/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/pxj/app/sqoop-1.4.6-cdh5.16.2/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/pxj/app/sqoop-1.4.6-cdh5.16.2/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/pxj/app/sqoop-1.4.6-cdh5.16.2/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
19/12/22 23:00:22 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.16.2
19/12/22 23:00:22 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/12/22 23:00:22 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
19/12/22 23:00:22 INFO tool.CodeGenTool: Beginning code generation
19/12/22 23:00:22 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1
19/12/22 23:00:22 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1
19/12/22 23:00:22 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/pxj/app/hadoop
注: /tmp/sqoop-pxj/compile/64e84271b36510b7d414686c3039ec1f/emp.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
19/12/22 23:00:24 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-pxj/compile/64e84271b36510b7d414686c3039ec1f/emp.jar
19/12/22 23:00:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/12/22 23:00:24 INFO tool.ImportTool: Destination directory EMP_OPTIONS_FILE2 is not present, hence not deleting.
19/12/22 23:00:24 WARN manager.MySQLManager: It looks like you are importing from mysql.
19/12/22 23:00:24 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
19/12/22 23:00:24 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
19/12/22 23:00:24 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
19/12/22 23:00:24 ERROR tool.ImportTool: Import failed: No primary key could be found for table emp. Please specify one with --split-by or perform a sequential import with '-m 1'.

六、HDFS导出到MySQL

sqoop export原理:

获取导出表的schema、meta信息,和Hadoop中的字段match;多个map only作业同时运行,完成hdfs中数据导出到关系型数据库中。

hdfs导入mysql

sqoop export \
-Dsqoop.export.records.per.statement=10 \
--connect jdbc:mysql://pxj31:3306/pxjdata \
--password lenovo \
--username root \
--table emp_import \
--export-dir /user/pxj/emp \
--fields-terminated-by '\t' \
-m 1

七、MySQL导入Hive

sqoop import \
--connect jdbc:mysql://pxj31:3306/pxjdata \
--password lenovo \
--username root \
--table emp \
--hive-overwrite \
--delete-target-dir \
--hive-import --hive-database default \
--hive-table emp_import_partition \
--hive-partition-key 'pt' \
--hive-partition-value '2019-12-30' \
--fields-terminated-by '\t' \
-m 1
Warning: /home/pxj/app/sqoop-1.4.6-cdh5.16.2/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/pxj/app/sqoop-1.4.6-cdh5.16.2/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/pxj/app/sqoop-1.4.6-cdh5.16.2/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/pxj/app/sqoop-1.4.6-cdh5.16.2/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
19/12/23 00:06:11 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.16.2
19/12/23 00:06:11 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/12/23 00:06:11 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
19/12/23 00:06:11 INFO tool.CodeGenTool: Beginning code generation
19/12/23 00:06:11 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1
19/12/23 00:06:11 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1
19/12/23 00:06:11 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/pxj/app/hadoop
注: /tmp/sqoop-pxj/compile/2f52a0857d0fdbbbaa1a7f0b4eeaf93d/emp.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
19/12/23 00:06:13 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-pxj/compile/2f52a0857d0fdbbbaa1a7f0b4eeaf93d/emp.jar
19/12/23 00:06:13 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/12/23 00:06:14 INFO tool.ImportTool: Destination directory emp deleted.
19/12/23 00:06:14 WARN manager.MySQLManager: It looks like you are importing from mysql.
19/12/23 00:06:14 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
19/12/23 00:06:14 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
19/12/23 00:06:14 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
19/12/23 00:06:14 INFO mapreduce.ImportJobBase: Beginning import of emp
19/12/23 00:06:14 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
19/12/23 00:06:14 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
19/12/23 00:06:14 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
19/12/23 00:06:32 INFO db.DBInputFormat: Using read commited transaction isolation
19/12/23 00:06:32 INFO mapreduce.JobSubmitter: number of splits:1
19/12/23 00:06:33 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1577030460083_0002
19/12/23 00:06:33 INFO impl.YarnClientImpl: Submitted application application_1577030460083_0002
19/12/23 00:06:33 INFO mapreduce.Job: The url to track the job: http://pxj31:38088/proxy/application_1577030460083_0002/
19/12/23 00:06:33 INFO mapreduce.Job: Running job: job_1577030460083_0002
19/12/23 00:06:43 INFO mapreduce.Job: Job job_1577030460083_0002 running in uber mode : false
19/12/23 00:06:43 INFO mapreduce.Job:  map 0% reduce 0%
19/12/23 00:06:51 INFO mapreduce.Job:  map 100% reduce 0%
19/12/23 00:06:52 INFO mapreduce.Job: Job job_1577030460083_0002 completed successfully
19/12/23 00:06:52 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=180758
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=87
        HDFS: Number of bytes written=871
        HDFS: Number of read operations=4
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Other local map tasks=1
        Total time spent by all maps in occupied slots (ms)=6379
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=6379
        Total vcore-milliseconds taken by all map tasks=6379
        Total megabyte-milliseconds taken by all map tasks=6532096
    Map-Reduce Framework
        Map input records=14
        Map output records=14
        Input split bytes=87
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=70
        CPU time spent (ms)=1160
        Physical memory (bytes) snapshot=180879360
        Virtual memory (bytes) snapshot=2780069888
        Total committed heap usage (bytes)=146276352
    File Input Format Counters 
        Bytes Read=0
    File Output Format Counters 
        Bytes Written=871
19/12/23 00:06:52 INFO mapreduce.ImportJobBase: Transferred 871 bytes in 38.2976 seconds (22.7429 bytes/sec)
19/12/23 00:06:52 INFO mapreduce.ImportJobBase: Retrieved 14 records.
19/12/23 00:06:52 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1
19/12/23 00:06:53 WARN hive.TableDefWriter: Column empno had to be cast to a less precise type in Hive
19/12/23 00:06:53 WARN hive.TableDefWriter: Column mgr had to be cast to a less precise type in Hive
19/12/23 00:06:53 WARN hive.TableDefWriter: Column hiredate had to be cast to a less precise type in Hive
19/12/23 00:06:53 WARN hive.TableDefWriter: Column sal had to be cast to a less precise type in Hive
19/12/23 00:06:53 WARN hive.TableDefWriter: Column comm had to be cast to a less precise type in Hive
19/12/23 00:06:53 WARN hive.TableDefWriter: Column deptno had to be cast to a less precise type in Hive
19/12/23 00:06:53 INFO hive.HiveImport: Loading uploaded data into Hive
0: jdbc:hive2://pxj31:10086> select * from emp_import;
INFO  : Compiling command(queryId=pxj_20191226012121_e44f806c-e8ab-4f37-a1bb-cf9df50fb2d8): select * from emp_import
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:emp_import.empno, type:double, comment:null), FieldSchema(name:emp_import.ename, type:string, comment:null), FieldSchema(name:emp_import.job, type:string, comment:null), FieldSchema(name:emp_import.mgr, type:double, comment:null), FieldSchema(name:emp_import.hiredate, type:string, comment:null), FieldSchema(name:emp_import.sal, type:double, comment:null), FieldSchema(name:emp_import.comm, type:double, comment:null), FieldSchema(name:emp_import.deptno, type:double, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=pxj_20191226012121_e44f806c-e8ab-4f37-a1bb-cf9df50fb2d8); Time taken: 0.352 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=pxj_20191226012121_e44f806c-e8ab-4f37-a1bb-cf9df50fb2d8): select * from emp_import
INFO  : Completed executing command(queryId=pxj_20191226012121_e44f806c-e8ab-4f37-a1bb-cf9df50fb2d8); Time taken: 0.001 seconds
INFO  : OK
+-------------------+-------------------+-----------------+-----------------+------------------------+-----------------+------------------+--------------------+--+
| emp_import.empno  | emp_import.ename  | emp_import.job  | emp_import.mgr  |  emp_import.hiredate   | emp_import.sal  | emp_import.comm  | emp_import.deptno  |
+-------------------+-------------------+-----------------+-----------------+------------------------+-----------------+------------------+--------------------+--+
| 7499.0            | ALLEN             | SALESMAN        | 7698.0          | 1981-02-20 00:00:00.0  | 1600.0          | 300.0            | 30.0               |
| 7521.0            | WARD              | SALESMAN        | 7698.0          | 1981-02-22 00:00:00.0  | 1250.0          | 500.0            | 30.0               |
| 7566.0            | JONES             | MANAGER         | 7839.0          | 1981-04-02 00:00:00.0  | 2975.0          | NULL             | 20.0               |
| 7654.0            | MARTIN            | SALESMAN        | 7698.0          | 1981-09-28 00:00:00.0  | 1250.0          | 1400.0           | 30.0               |
| 7698.0            | BLAKE             | MANAGER         | 7839.0          | 1981-05-01 00:00:00.0  | 2850.0          | NULL             | 30.0               |
| 7782.0            | CLARK             | MANAGER         | 7839.0          | 1981-06-09 00:00:00.0  | 2450.0          | NULL             | 10.0               |
| 7788.0            | SCOTT             | ANALYST         | 7566.0          | 1982-12-09 00:00:00.0  | 3000.0          | NULL             | 20.0               |
| 7839.0            | KING              | PRESIDENT       | NULL            | 1981-11-17 00:00:00.0  | 5000.0          | NULL             | 10.0               |
| 7844.0            | TURNER            | SALESMAN        | 7698.0          | 1981-09-08 00:00:00.0  | 1500.0          | 0.0              | 30.0               |
| 7876.0            | ADAMS             | CLERK           | 7788.0          | 1983-01-12 00:00:00.0  | 1100.0          | NULL             | 20.0               |
| 7900.0            | JAMES             | CLERK           | 7698.0          | 1981-12-03 00:00:00.0  | 950.0           | NULL             | 30.0               |
| 7902.0            | FORD              | ANALYST         | 7566.0          | 1981-12-03 00:00:00.0  | 3000.0          | NULL             | 20.0               |
| 7934.0            | MILLER            | CLERK           | 7782.0          | 1982-01-23 00:00:00.0  | 1300.0          | NULL             | 10.0               |
| 7369.0            | SMITH             | CLERK           | 7902.0          | 1980-12-17 00:00:00.0  | 800.0           | NULL             | 20.0               |
+-------------------+-------------------+-----------------+-----------------+------------------------+-----------------+------------------+--------------------+--+
14 rows selected (0.51 seconds)
​
​
sqoop import \
--connect jdbc:mysql://pxj31:3306/pxjdata \
--password lenovo \
--username root \
--table emp \
--hive-overwrite \
--delete-target-dir \
--hive-import --hive-database test \
--hive-table emp_import_partition \
--hive-partition-key 'pt' \
--hive-partition-value '2019-12-30' \
--fields-terminated-by '\t' \
-m 1
0: jdbc:hive2://pxj31:10086> select * from emp_import_partition;
INFO  : Compiling command(queryId=pxj_20191226012525_a29e48d8-c9f1-4dcc-9e9f-7294c754a37a): select * from emp_import_partition
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:emp_import_partition.empno, type:double, comment:null), FieldSchema(name:emp_import_partition.ename, type:string, comment:null), FieldSchema(name:emp_import_partition.job, type:string, comment:null), FieldSchema(name:emp_import_partition.mgr, type:double, comment:null), FieldSchema(name:emp_import_partition.hiredate, type:string, comment:null), FieldSchema(name:emp_import_partition.sal, type:double, comment:null), FieldSchema(name:emp_import_partition.comm, type:double, comment:null), FieldSchema(name:emp_import_partition.deptno, type:double, comment:null), FieldSchema(name:emp_import_partition.pt, type:string, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=pxj_20191226012525_a29e48d8-c9f1-4dcc-9e9f-7294c754a37a); Time taken: 0.312 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=pxj_20191226012525_a29e48d8-c9f1-4dcc-9e9f-7294c754a37a): select * from emp_import_partition
INFO  : Completed executing command(queryId=pxj_20191226012525_a29e48d8-c9f1-4dcc-9e9f-7294c754a37a); Time taken: 0.0 seconds
INFO  : OK
+-----------------------------+-----------------------------+---------------------------+---------------------------+--------------------------------+---------------------------+----------------------------+------------------------------+--------------------------+--+
| emp_import_partition.empno  | emp_import_partition.ename  | emp_import_partition.job  | emp_import_partition.mgr  | emp_import_partition.hiredate  | emp_import_partition.sal  | emp_import_partition.comm  | emp_import_partition.deptno  | emp_import_partition.pt  |
+-----------------------------+-----------------------------+---------------------------+---------------------------+--------------------------------+---------------------------+----------------------------+------------------------------+--------------------------+--+
| 7499.0                      | ALLEN                       | SALESMAN                  | 7698.0                    | 1981-02-20 00:00:00.0          | 1600.0                    | 300.0                      | 30.0                         | 2019-12-30               |
| 7521.0                      | WARD                        | SALESMAN                  | 7698.0                    | 1981-02-22 00:00:00.0          | 1250.0                    | 500.0                      | 30.0                         | 2019-12-30               |
| 7566.0                      | JONES                       | MANAGER                   | 7839.0                    | 1981-04-02 00:00:00.0          | 2975.0                    | NULL                       | 20.0                         | 2019-12-30               |
| 7654.0                      | MARTIN                      | SALESMAN                  | 7698.0                    | 1981-09-28 00:00:00.0          | 1250.0                    | 1400.0                     | 30.0                         | 2019-12-30               |
| 7698.0                      | BLAKE                       | MANAGER                   | 7839.0                    | 1981-05-01 00:00:00.0          | 2850.0                    | NULL                       | 30.0                         | 2019-12-30               |
| 7782.0                      | CLARK                       | MANAGER                   | 7839.0                    | 1981-06-09 00:00:00.0          | 2450.0                    | NULL                       | 10.0                         | 2019-12-30               |
| 7788.0                      | SCOTT                       | ANALYST                   | 7566.0                    | 1982-12-09 00:00:00.0          | 3000.0                    | NULL                       | 20.0                         | 2019-12-30               |
| 7839.0                      | KING                        | PRESIDENT                 | NULL                      | 1981-11-17 00:00:00.0          | 5000.0                    | NULL                       | 10.0                         | 2019-12-30               |
| 7844.0                      | TURNER                      | SALESMAN                  | 7698.0                    | 1981-09-08 00:00:00.0          | 1500.0                    | 0.0                        | 30.0                         | 2019-12-30               |
| 7876.0                      | ADAMS                       | CLERK                     | 7788.0                    | 1983-01-12 00:00:00.0          | 1100.0                    | NULL                       | 20.0                         | 2019-12-30               |
| 7900.0                      | JAMES                       | CLERK                     | 7698.0                    | 1981-12-03 00:00:00.0          | 950.0                     | NULL                       | 30.0                         | 2019-12-30               |
| 7902.0                      | FORD                        | ANALYST                   | 7566.0                    | 1981-12-03 00:00:00.0          | 3000.0                    | NULL                       | 20.0                         | 2019-12-30               |
| 7934.0                      | MILLER                      | CLERK                     | 7782.0                    | 1982-01-23 00:00:00.0          | 1300.0                    | NULL                       | 10.0                         | 2019-12-30               |
| 7369.0                      | SMITH                       | CLERK                     | 7902.0                    | 1980-12-17 00:00:00.0          | 800.0                     | NULL                       | 20.0                         | 2019-12-30               |
+-----------------------------+-----------------------------+---------------------------+---------------------------+--------------------------------+---------------------------+----------------------------+------------------------------+--------------------------+--+
14 rows selected (0.356 seconds)
​
hive导数据到MySQL
sqoop export \
--connect jdbc:mysql://pxj31:3306/pxjdata \
--password lenovo \
--username root \
--table dept_demo \
--export-dir /user/hive/warehouse/dept \
--fields-terminated-by '\t' \
-m 1
​
指定数据库和表名导出
sqoop job --create ruozedata-sqoop-job1 -- \
import --connect jdbc:mysql://pxj31:3306/pxjdata \
--password lenovo \
--username root \
--table emp \
--delete-target-dir
​

ETL案例

ETL操作
需求:emp和dept表是在MySQL,把MySQL数据抽取到Hive进行统计分析,
然后将统计结果回写到MySQL中

Hive:创建emp_etl和dept_etl两张表
统计:select e.empno, e.ename, e.deptno, e.dname from emp_etl e join dept_etl d on e.deptno=d.deptno;
MySQL: 创建一个result_etl结果表

1.建表

CREATE TABLE emp_etl(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
​
CREATE TABLE dept_etl(
deptno int,
dname string,
loc string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
Hive中创建结果表
CREATE TABLE result_etl(
empno int,
ename string,
deptno int,
dname string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

2.导入hive

sqoop import \
--connect jdbc:mysql://pxj31:3306/pxjdata \
--password lenovo \
--username root \
--table emp \
--hive-overwrite \
--delete-target-dir \
--hive-import --hive-database default \
--hive-table emp_etl \
--fields-terminated-by '\t' \
-m 1
​
sqoop import \
--connect jdbc:mysql://pxj31:3306/pxjdata \
--password lenovo \
--username root \
--table dept \
--hive-overwrite \
--delete-target-dir \
--hive-import --hive-database default \
--hive-table dept_etl \
--fields-terminated-by '\t' \
-m 1
​

3.查询插入

insert overwrite table result_etl select e.empno, e.ename, e.deptno, d.dname from emp_etl e join dept_etl d on e.deptno=d.deptno;   
hive (default)> select * from result_etl;
OK
result_etl.empno    result_etl.ename    result_etl.deptno   result_etl.dname
7499    ALLEN   30  SALES
7521    WARD    30  SALES
7566    JONES   20  RESEARCH
7654    MARTIN  30  SALES
7698    BLAKE   30  SALES
7782    CLARK   10  ACCOUNTING
7788    SCOTT   20  RESEARCH
7839    KING    10  ACCOUNTING
7844    TURNER  30  SALES
7876    ADAMS   20  RESEARCH
7900    JAMES   30  SALES
7902    FORD    20  RESEARCH
7934    MILLER  10  ACCOUNTING
7369    SMITH   20  RESEARCH
Time taken: 0.055 seconds, Fetched: 14 row(s)

4.导出到MySQL

create table etl_result(
empno int,
ename varchar(10),
deptno int,
dname varchar(20)
);
sqoop export \
--connect jdbc:mysql://pxj31:3306/pxjdata \
--password lenovo \
--username root \
--table etl_result \
--export-dir /user/hive/warehouse/result_etl \
--fields-terminated-by '\t' \
-m 1
select * from  etl_result;
7499    ALLEN   30  SALES
7521    WARD    30  SALES
7566    JONES   20  RESEARCH
7654    MARTIN  30  SALES
7698    BLAKE   30  SALES
7782    CLARK   10  ACCOUNTING
7788    SCOTT   20  RESEARCH
7839    KING    10  ACCOUNTING
7844    TURNER  30  SALES
7876    ADAMS   20  RESEARCH
7900    JAMES   30  SALES
7902    FORD    20  RESEARCH
7934    MILLER  10  ACCOUNTING
7369    SMITH   20  RESEARCH
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值