Sqoop总结
概述
sqoop是apache旗下一款“Hadoop和关系数据库服务器之间传送数据”的工具。
导入数据:MySQL,Oracle导入数据到Hadoop的HDFS、HIVE、HBASE等数据存储系统;
导出数据:从Hadoop的文件系统中导出数据到关系数据库mysql等
工作机制
将导入或导出命令翻译成mapreduce程序来实现
在翻译出的mapreduce中主要是对inputformat和outputformat进行定制
sqoop安装
安装sqoop的前提是已经具备java和hadoop的环境
1-1)、下载并解压
软件下载:
链接:http://pan.baidu.com/s/1ge3wD1p 密码:0dgs 如果无法下载请联系作者。
[root@hadoop1 sqoop]# tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
[root@hadoop1 sqoop]# mv sqoop-1.4.6.bin__hadoop-2.0.4-alpha/ sqoop
[root@hadoop1 sqoop]# cd conf/
1-2)、修改配置文件
[root@hadoop1 conf]# cp sqoop-env-template.sh sqoop-env.sh
[root@hadoop1 conf]# vi sqoop-env.sh
export HADOOP_COMMON_HOME=/usr/local/hadoop-2.6.4
export HADOOP_MAPRED_HOME=/usr/local/hadoop-2.6.4
export HIVE_HOME=/usr/local/hive
1-3)、加入mysql的jdbc驱动包
[root@hadoop1 lib]# cp /usr/local/hive/lib/mysql-connector-java-5.1.39.jar /usr/local/sqoop/lib/
1-4 ) 、启动
[root@hadoop1 conf]# cd ../bin/
[root@hadoop1 bin]# ./sqoop-version
Warning: /usr/local/sqoop/bin/../../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /usr/local/sqoop/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/sqoop/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
16/09/28 16:05:12 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
Sqoop 1.4.6
git commit id c0c5a81723759fa575844a0a1eae8f510fa32c25
Compiled by root on Mon Apr 27 14:38:36 CST 2015
警告可以省略、、、
Sqoop的数据导入
Sqoop 支持导入与导出数据到HDFS与mysql , hive中的TBLS表中的数据
下面的语法用于将数据导入HDFS。
$ sqoop import (generic-args) (import-args) |
表数据
使用mysql自带的数据库的数据mysql数据库中的user , db 、、、
导入到HDFS默认目录
[root@hadoop1 bin]# ./sqoop import --connect jdbc:mysql://localhost:3306/mysql --username it --password it --table user --m 1
********
Map-Reduce Framework
Map input records=11
Map output records=11
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=72
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=59621376
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=1386
16/09/28 16:36:15 INFO mapreduce.ImportJobBase: Transferred 1.3535 KB in 26.3647 seconds (52.5704 bytes/sec)
16/09/28 16:36:15 INFO mapreduce.ImportJobBase: Retrieved 11 records.
--connect 需要链接的驱动串
--username 链接用户名
--pssword 链接密码
-- table 需要对那个表操作
--m 需要多少个mapreduce
[root@hadoop1 bin]# hadoop fs -ls /user/root/user
Found 2 items
-rw-r--r-- 3 root supergroup 0 2016-09-28 16:36 /user/root/user/_SUCCESS
-rw-r--r-- 3 root supergroup 1386 2016-09-28 16:36 /user/root/user/part-m-00000
[root@hadoop1 bin]# hadoop fs -cat /user/root/user/part-m-00000
localhost,root,,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0,mysql_native_password,,N
hadoop1,root,,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0,mysql_native_password,,N
127.0.0.1,root,,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0,mysql_native_password,,N
::1,root,,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0,mysql_native_password,,N
localhost,,,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,,,,,0,0,0,0,mysql_native_password,null,N
hadoop1,,,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,,,,,0,0,0,0,mysql_native_password,null,N
%,myuser,*FABE5482D5AADF36D028AC443D117BE1180B9725,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0,mysql_native_password,,N
%,root,*81F5E21E35407D884A6CD4A731AEBFB6AF209E1B,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0,mysql_native_password,,N
%,it,*EDD9BF22E6B82D19548E3F1FE5DCE4EE104BA3A7,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0,mysql_native_password,,N
%,itoffice,*FE2B2BCDE5445749A0C82A89B738BD0BD6B1F407,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0,mysql_native_password,,N
localhost,it,*EDD9BF22E6B82D19548E3F1FE5DCE4EE104BA3A7,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0,mysql_native_password,,N
导入到HDFS指定目录
以下的导出来的数据是以逗号分隔的
[root@hadoop1 bin]# ./sqoop import --connect jdbc:mysql://localhost:3306/mysql --username it --password it --target-dir /sqoop --table user --m 1
99
************
Map-Reduce Framework
Map input records=11
Map output records=11
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=76
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=56848384
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=1386
16/09/28 16:55:35 INFO mapreduce.ImportJobBase: Transferred 1.3535 KB in 18.283 seconds (75.8079 bytes/sec)
16/09/28 16:55:35 INFO mapreduce.ImportJobBase: Retrieved 11 records.
[root@hadoop1 testData]# hadoop fs -ls /sqoop
Found 2 items
-rw-r--r-- 3 root supergroup 0 2016-09-28 16:55 /sqoop/_SUCCESS
-rw-r--r-- 3 root supergroup 1386 2016-09-28 16:55 /sqoop/part-m-00000
[root@hadoop1 testData]# hadoop fs -cat /sqoop/part-m-00000
localhost,root,,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0,mysql_native_password,,N
hadoop1,root,,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0,mysql_native_password,,N
127.0.0.1,root,,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0,mysql_native_password,,N
::1,root,,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0,mysql_native_password,,N
localhost,,,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,,,,,0,0,0,0,mysql_native_password,null,N
hadoop1,,,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,,,,,0,0,0,0,mysql_native_password,null,N
%,myuser,*FABE5482D5AADF36D028AC443D117BE1180B9725,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0,mysql_native_password,,N
%,root,*81F5E21E35407D884A6CD4A731AEBFB6AF209E1B,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0,mysql_native_password,,N
%,it,*EDD9BF22E6B82D19548E3F1FE5DCE4EE104BA3A7,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0,mysql_native_password,,N
%,itoffice,*FE2B2BCDE5445749A0C82A89B738BD0BD6B1F407,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0,mysql_native_password,,N
localhost,it,*EDD9BF22E6B82D19548E3F1FE5DCE4EE104BA3A7,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0,mysql_native_password,,N
导入关系表到HIVE
查看源数据
[root@hadoop1 bin]# ./sqoop import --connect jdbc:mysql://localhost:3306/mysql --username it --password it --table user --hive-import --m 1
****************
16/09/28 17:43:22 INFO tool.CodeGenTool: Beginning code generation
16/09/28 17:43:25 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `TBLS` AS t LIMIT 1
16/09/28 17:43:25 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `TBLS` AS t LIMIT 1
********************
16/09/28 17:47:02 INFO hive.HiveImport: Time taken: 18.294 seconds
16/09/28 17:47:04 INFO hive.HiveImport: Loading data to table default.tbls
16/09/28 17:47:10 INFO hive.HiveImport: Table default.tbls stats: [numFiles=1, totalSize=710]
16/09/28 17:47:10 INFO hive.HiveImport: OK
16/09/28 17:47:10 INFO hive.HiveImport: Time taken: 7.94 seconds
16/09/28 17:47:12 INFO hive.HiveImport: Hive import complete.
16/09/28 17:47:12 INFO hive.HiveImport: Export directory is contains the _SUCCESS file only, removing the directory.
从以上的标红的部分发现数据保存的位置、、、
hive> show databases;
OK
default
Time taken: 0.418 seconds, Fetched: 5 row(s)
hive> use default;
OK
Time taken: 0.363 seconds
hive> show tables;
OK
tbls
Time taken: 0.104 seconds, Fetched: 1 row(s)
hive> select * from tbls;
OK
1 1474767895 2 0 root 0 1 hive MANAGED_TABLE null null
2 1474769518 2 0 root 0 2 t_test MANAGED_TABLE null null
6 1474773899 2 0 root 0 6 t_partition MANAGED_TABLE null null
7 1474773977 2 0 root 0 7 t_part MANAGED_TABLE null null
8 1474804219 2 0 root 0 10 td_ext_a EXTERNAL_TABLE null null
9 1474805273 2 0 root 0 11 td_ext_b EXTERNAL_TABLE null null
10 1474805448 2 0 root 0 12 td_ext_c MANAGED_TABLE null null
11 1474805587 2 0 root 0 13 stu__buck MANAGED_TABLE null null
16 1475100139 6 0 root 0 16 hive MANAGED_TABLE null null
21 1475101361 11 0 root 0 21 hive1 MANAGED_TABLE null null
26 1475101790 16 0 root 0 26 hive2 MANAGED_TABLE null null
27 1475101831 16 0 root 0 27 hive3 MANAGED_TABLE null null
Time taken: 6.308 seconds, Fetched: 12 row(s)
按照条件导入数据
select * from TBLS where TBL_NAME = 't_part'
1-1)导入数据
[root@hadoop1 bin]# ./sqoop import --connect jdbc:mysql://localhost:3306/hive --username it --password it --where "TBL_NAME ='t_part'" --target-dir /wherequery --table TBLS --m 1
******************
16/09/28 21:51:51 INFO mapred.MapTask: Processing split: 1=1 AND 1=1
16/09/28 21:51:52 INFO db.DBRecordReader: Working on split: 1=1 AND 1=1
16/09/28 21:51:52 INFO db.DBRecordReader: Executing query: SELECT `TBL_ID`, `CREATE_TIME`, `DB_ID`, `LAST_ACCESS_TIME`, `OWNER`, `RETENTION`, `SD_ID`, `TBL_NAME`, `TBL_TYPE`, `VIEW_EXPANDED_TEXT`, `VIEW_ORIGINAL_TEXT` FROM `TBLS` AS `TBLS` WHERE ( TBL_NAME ='t_part' ) AND ( 1=1 ) AND ( 1=1 )
******************
16/09/28 21:51:56 INFO mapreduce.ImportJobBase: Transferred 57 bytes in 34.7512 seconds (1.6402 bytes/sec)
16/09/28 21:51:56 INFO mapreduce.ImportJobBase: Retrieved 1 records.
1-2 ) 查看数据
[root@hadoop1 testData]# hadoop fs -cat /wherequery/part-m-00000
7,1474773977,2,0,root,0,7,t_part,MANAGED_TABLE,null,null
--where 需要的条件
1-1)导入数据
[root@hadoop1 bin]# ./sqoop import --connect jdbc:mysql://localhost:3306/hive --username it --password it --target-dir /select --query 'select TBL_ID,TBL_NAME,TBL_TYPE from TBLS WHERE TBL_ID>2 and $CONDITIONS' --split-by id --fields-terminated-by '\t' --m 1
**************
16/09/28 22:24:33 INFO db.DBRecordReader: Working on split: 1=1 AND 1=1
16/09/28 22:24:33 INFO db.DBRecordReader: Executing query: select TBL_ID,TBL_NAME,TBL_TYPE from TBLS WHERE TBL_ID>2 and ( 1=1 ) AND ( 1=1 )
16/09/28 22:24:33 INFO mapreduce.AutoProgressMapper: Auto-progress thread is finished. keepGoing=false
**************
16/09/28 22:24:37 INFO mapreduce.ImportJobBase: Transferred 269 bytes in 25.5587 seconds (10.5248 bytes/sec)
16/09/28 22:24:37 INFO mapreduce.ImportJobBase: Retrieved 11 records.
--query 查询语句的形式读取数据
--split-by 对某一个字段进行分割
--fields-terminated-by 以什么分隔符分割数据
1-2)查看数据
[root@hadoop1 testData]# hadoop fs -cat /select/part-m-00000
6 t_partition MANAGED_TABLE
7 t_part MANAGED_TABLE
8 td_ext_a EXTERNAL_TABLE
9 td_ext_b EXTERNAL_TABLE
10 td_ext_c MANAGED_TABLE
11 stu__buck MANAGED_TABLE
16 hive MANAGED_TABLE
21 hive1 MANAGED_TABLE
26 hive2 MANAGED_TABLE
27 hive3 MANAGED_TABLE
31 tbls MANAGED_TABLE
增量导入
增量导入值的是在导入时分批导入,这样避免了一个表过大浪费时间,需要在语句中新增
Incremental , check-column , last-value 来定义,选项如下:
--incremental <mode> 增量的方式
--check-column <column name> 约束的字段
--last value <last check column value> 以上次的数据的位置开始导入数据
[root@hadoop1 bin]# ./sqoop import --connect jdbc:mysql://localhost:3306/hive --username it --password it --table TBLS --m 1 --incremental append --check-column TBL_ID --last-value 10
**************
16/09/28 22:38:54 INFO tool.ImportTool: Maximal id query for free form incremental import: SELECT MAX(`TBL_ID`) FROM `TBLS`
16/09/28 22:38:54 INFO tool.ImportTool: Incremental import based on column `TBL_ID`
16/09/28 22:38:54 INFO tool.ImportTool: Lower bound value: 10
16/09/28 22:38:54 INFO tool.ImportTool: Upper bound value: 31
16/09/28 22:38:54 WARN manager.MySQLManager: It looks like you are importing from mysql.
*************
16/09/28 22:39:15 INFO db.DBRecordReader: Working on split: 1=1 AND 1=1
16/09/28 22:39:15 INFO db.DBRecordReader: Executing query: SELECT `TBL_ID`, `CREATE_TIME`, `DB_ID`, `LAST_ACCESS_TIME`, `OWNER`, `RETENTION`, `SD_ID`, `TBL_NAME`, `TBL_TYPE`, `VIEW_EXPANDED_TEXT`, `VIEW_ORIGINAL_TEXT` FROM `TBLS` AS `TBLS` WHERE ( `TBL_ID` > 10 AND `TBL_ID` <= 31 ) AND ( 1=1 ) AND ( 1=1 )
**************
16/09/28 22:39:19 INFO tool.ImportTool: --incremental append
16/09/28 22:39:19 INFO tool.ImportTool: --check-column TBL_ID
16/09/28 22:39:19 INFO tool.ImportTool: --last-value 31
16/09/28 22:39:19 INFO tool.ImportTool: (Consider saving this with 'sqoop job --create')
[root@hadoop1 testData]# hadoop fs -cat /user/root/TBLS/part-m-00000
11,1474805587,2,0,root,0,13,stu__buck,MANAGED_TABLE,null,null
16,1475100139,6,0,root,0,16,hive,MANAGED_TABLE,null,null
21,1475101361,11,0,root,0,21,hive1,MANAGED_TABLE,null,null
26,1475101790,16,0,root,0,26,hive2,MANAGED_TABLE,null,null
27,1475101831,16,0,root,0,27,hive3,MANAGED_TABLE,null,null
31,1475110018,1,0,root,0,31,tbls,MANAGED_TABLE,null,null
–incremental lastmodified –check-column created –last-value ‘2012-02-01 11:0:00′
就是只导入created 比’2012-02-01 11:0:00’更大的数据。
Sqoop的数据导出
在mysql 数据库中创建sqoop数据库,并在数据库中添加一个表结构:
SET FOREIGN_KEY_CHECKS=0;
-- ----------------------------
-- Table structure for TBLS
-- ----------------------------
DROP TABLE IF EXISTS `TBLS`;
CREATE TABLE `TBLS` (
`TBL_ID` bigint(20) NOT NULL,
`CREATE_TIME` int(11) NOT NULL,
`DB_ID` bigint(20) DEFAULT NULL,
`LAST_ACCESS_TIME` int(11) NOT NULL,
`OWNER` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL,
`RETENTION` int(11) NOT NULL,
`SD_ID` bigint(20) DEFAULT NULL,
`TBL_NAME` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL,
`TBL_TYPE` varchar(128) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL,
`VIEW_EXPANDED_TEXT` mediumtext,
`VIEW_ORIGINAL_TEXT` mediumtext,
PRIMARY KEY (`TBL_ID`),
UNIQUE KEY `UNIQUETABLE` (`TBL_NAME`,`DB_ID`),
KEY `TBLS_N50` (`DB_ID`),
KEY `TBLS_N49` (`SD_ID`),
CONSTRAINT `TBLS_FK1` FOREIGN KEY (`DB_ID`) REFERENCES `DBS` (`DB_ID`),
CONSTRAINT `TBLS_FK2` FOREIGN KEY (`SD_ID`) REFERENCES `SDS` (`SD_ID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
1-1)、导出数据
[root@hadoop1 bin]# ./sqoop export --connect jdbc:mysql://localhost:3306/sqoop --username it --password it --table TBLS --export-dir /user/root/TBLS/part-m-00000
************
*****************
16/09/28 23:07:19 INFO mapreduce.ExportJobBase: Transferred 2.3008 KB in 21.7299 seconds (108.422 bytes/sec)
16/09/28 23:07:19 INFO mapreduce.ExportJobBase: Exported 6 records.
1-2)、查看数据
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| azkaban |
| hive |
| mysql |
| performance_schema |
| sqoop |
+--------------------+
6 rows in set (0.08 sec)
mysql> use sqoop
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> show tables;
+-----------------+
| Tables_in_sqoop |
+-----------------+
| TBLS |
+-----------------+
1 row in set (0.00 sec)
mysql> select * from TBLS;
+--------+-------------+-------+------------------+-------+-----------+-------+-----------+---------------+--------------------+--------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME | TBL_TYPE | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT |
+--------+-------------+-------+------------------+-------+-----------+-------+-----------+---------------+--------------------+--------------------+
| 11 | 1474805587 | 2 | 0 | root | 0 | 13 | stu__buck | MANAGED_TABLE | NULL | NULL |
| 16 | 1475100139 | 6 | 0 | root | 0 | 16 | hive | MANAGED_TABLE | NULL | NULL |
| 21 | 1475101361 | 11 | 0 | root | 0 | 21 | hive1 | MANAGED_TABLE | NULL | NULL |
| 26 | 1475101790 | 16 | 0 | root | 0 | 26 | hive2 | MANAGED_TABLE | NULL | NULL |
| 27 | 1475101831 | 16 | 0 | root | 0 | 27 | hive3 | MANAGED_TABLE | NULL | NULL |
| 31 | 1475110018 | 1 | 0 | root | 0 | 31 | tbls | MANAGED_TABLE | NULL | NULL |
+--------+-------------+-------+------------------+-------+-----------+-------+-----------+---------------+--------------------+--------------------+
6 rows in set (0.00 sec)
Sqoop 常用Mysql操作语句
1-1)、查看mysql有多少个数据库
[root@hadoop1 ~]# sqoop list-databases --connect jdbc:mysql://localhost:3306/ --username root --password 123456
information_schema
azkaban
hive
mysql
performance_schema
Xiaoxu
list-databases :数据库的集合
1-2)、查看某一个数据库下的表
[root@hadoop1 ~]# sqoop list-tables --connect jdbc:mysql://localhost:3306/xiaoxu --username root --password 123456
1-3)、将mysql表中的表结构复制到Hive表中
sqoop create-hive-table --connect jdbc:mysql://hadoop1:3306/xiaoxu --table sp_address_code --username root --password 123456 --hive-table sp_address_code
1-4)、将mysql中的数据导入到hive中
sqoop import --connect jdbc:mysql://localhost:3306/xiaoxu --username root --password 123456 --table sp_address_code --hive-import --m 1
1-5)、将hive中的表数据导入到mysql中
[root@hadoop1 start-single]# sqoop export --connect jdbc:mysql://localhost:3306/sp_address --username root --password 123456 --table sp_address --export-dir /user/hive/warehouse/xiaoxu.db/sp_address -m 20
Sqoop作业
sqoop 作业也就是将定义好的数据导入与导出任务按照指定的流程执行。
Sqoop 作业的语法如下:
$ sqoop job (generic-args) (job-args)
[-- [subtool-name] (subtool-args)]
$ sqoop-job (generic-args) (job-args)
[-- [subtool-name] (subtool-args)]
1-1)、创建作业
[root@hadoop1 bin]# ./sqoop job --create person_job -- import --connect jdbc:mysql://localhost:3306/sqoop --username it --password it --table TEST -m 1
*************
16/09/29 01:06:49 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
16/09/29 01:06:54 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
注意以上的标红的空格。
1-2)、验证作业
[root@hadoop1 bin]# ./sqoop job --list
***************
16/09/29 01:07:23 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
Available jobs:
person_job
[root@hadoop1 bin]# ./sqoop job --show person_job
16/09/29 01:10:06 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
Enter password:
Job: person_job
Tool: import
Options:
----------------------------
verbose = false
db.connect.string = jdbc:mysql://localhost:3306/sqoop
codegen.output.delimiters.escape = 0
codegen.output.delimiters.enclose.required = false
codegen.input.delimiters.field = 0
hbase.create.table = false
db.require.password = true
hdfs.append.dir = false
db.table = TEST
codegen.input.delimiters.escape = 0
import.fetch.size = null
accumulo.create.table = false
codegen.input.delimiters.enclose.required = false
db.username = it
reset.onemapper = false
codegen.output.delimiters.record = 10
import.max.inline.lob.size = 16777216
hbase.bulk.load.enabled = false
hcatalog.create.table = false
db.clear.staging.table = false
codegen.input.delimiters.record = 0
enable.compression = false
hive.overwrite.table = false
hive.import = false
codegen.input.delimiters.enclose = 0
accumulo.batch.size = 10240000
hive.drop.delims = false
codegen.output.delimiters.enclose = 0
hdfs.delete-target.dir = false
codegen.output.dir = .
codegen.auto.compile.dir = true
relaxed.isolation = false
mapreduce.num.mappers = 1
accumulo.max.latency = 5000
import.direct.split.size = 0
codegen.output.delimiters.field = 44
export.new.update = UpdateOnly
incremental.mode = None
hdfs.file.format = TextFile
codegen.compile.dir = /tmp/sqoop-root/compile/d96fb0fc73fdb5ad13067fdcd914d3a1
direct.import = false
hive.fail.table.exists = false
db.batch = false
遇到以上的输入密码输入mysql登录的密码。
1-3)、执行作业
[root@hadoop1 bin]# ./sqoop job --exec person_job
16/09/29 01:19:00 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
Enter password:
16/09/29 01:19:06 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
**************
16/09/29 01:20:07 INFO db.DBRecordReader: Working on split: 1=1 AND 1=1
16/09/29 01:20:07 INFO db.DBRecordReader: Executing query: SELECT `TBL_ID`, `CREATE_TIME`, `DB_ID`, `LAST_ACCESS_TIME`, `OWNER`, `RETENTION`, `SD_ID`, `TBL_NAME`, `TBL_TYPE`, `VIEW_EXPANDED_TEXT`, `VIEW_ORIGINAL_TEXT` FROM `TEST` AS `TEST` WHERE ( 1=1 ) AND ( 1=1 )
*************
16/09/29 01:20:11 INFO output.FileOutputCommitter: Saved output of task 'attempt_local698116558_0001_m_000000_0' to hdfs://hadoop1:9000/user/root/TEST/_temporary/0/task_local698116558_0001_m_000000
16/09/29 01:20:11 INFO mapred.LocalJobRunner: map
*************
16/09/29 01:20:12 INFO mapreduce.ImportJobBase: Transferred 353 bytes in 36.2109 seconds (9.7484 bytes/sec)
16/09/29 01:20:12 INFO mapreduce.ImportJobBase: Retrieved 6 records.
遇到以上的输入密码输入mysql登录的密码。
1-4)、查看数据
[root@hadoop1 ~]# hadoop fs -cat /user/root/TEST/part-m-00000
11,1474805587,2,0,root,0,13,stu__buck,MANAGED_TABLE,null,null
16,1475100139,6,0,root,0,16,hive,MANAGED_TABLE,null,null
21,1475101361,11,0,root,0,21,hive1,MANAGED_TABLE,null,null
26,1475101790,16,0,root,0,26,hive2,MANAGED_TABLE,null,null
27,1475101831,16,0,root,0,27,hive3,MANAGED_TABLE,null,null
31,1475110018,1,0,root,0,31,tbls,MANAGED_TABLE,null,null