import
- sqoop的命令
[yao@master ~]$ sqoop help
Available commands:
codegen Generate code to interact with database records
create-hive-table Import a table definition into Hive
eval Evaluate a SQL statement and display the results
export Export an HDFS directory to a database table
help List available commands
import Import a table from a database to HDFS
import-all-tables Import tables from a database to HDFS
import-mainframe Import datasets from a mainframe server to HDFS
job Work with saved jobs
list-databases List available databases on a server
list-tables List available tables in a database
merge Merge results of incremental imports
metastore Run a standalone Sqoop metastore
version Display version information
从MySQL导入到HDFS
- 参数
参数内容 | 含义 |
---|---|
–connect < jdbc-uri > | JDBC 连接地址 |
–connection-manager < class-nam e> | 指定使用的连接管理类 |
–driver < class-name > | 指定JDBC要使用的驱动类 |
–help | 打印用法帮助信息 |
-P | 从控制台读取输入的密码 |
-m | 复制过程中使用1个map作业,若是不写默认是4个map |
–password < password> | 密码 |
–username < username > | 账号 |
–table < tablename > | MySQL表名 |
–fields-terminated-by < separator > | 输出文件中的行的字段分隔符 |
–target-dir < dir > | 指定HDFS路径 |
–where < condition > | 指定导出时所使用的查询条件 |
–verbose | 在控制台打印出详细运行信息 |
–connection-param-file < filename > | 可选,一个记录着数据库连接参数的文件 |
把mysql中数据库为mysql中的表help_keyword到入到hdfs中
- 导入默认路径
[yao@master ~]$ sqoop import --connect jdbc:mysql://master:3306/mysql --username hadoop --password hadoop --table help_keyword -m 1
//在不指定导入目录的情况下,默认的路径就是/user/lan/导入表的表命名的目录下
[yao@master ~]$ hadoop fs -ls /user/yao/help_keyword
Found 2 items
-rw-r--r-- 1 yao supergroup 0 2019-03-26 16:40 /user/yao/help_keyword/_SUCCESS
-rw-r--r-- 1 yao supergroup 9748 2019-03-26 16:40 /user/yao/help_keyword/part-m-00000
[yao@master ~]$ hadoop fs -cat /user/yao/help_keyword/part-m-00000
0,(JSON
1,->
2,->>
3,<>
4,ACCOUNT
5,ACTION
6,ADD
7,AES_DECRYPT
8,AES_ENCRYPT
9,AFTER
10,AGAINST
......
- 在导入的同时,添加导入路径以及分隔符
//把mysql中数据库为mysql中的表help_keyword到入到hdfs中,指定的目录是/mysqltohdfs/help_keyword,并且分隔符是"\t"
[yao@master ~]$ sqoop import --connect jdbc:mysql://master:3306/mysql --username hadoop --password hadoop --table help_keyword --target-dir '/mysqltohdfs/help_keyword' --fields-terminated-by '\t' -m 1
//在对应目录下查看
[yao@master ~]$ hadoop fs -cat /mysqltohdfs/help_keyword/part-m-00000|head -10
0 (JSON
1 ->
2 ->>
3 <>
4 ACCOUNT
5 ACTION
6 ADD
7 AES_DECRYPT
8 AES_ENCRYPT
9 AFTER
从MySQL导入到Hive
- 参数
参数内容 | 含义 |
---|---|
–hive-home < dir > | 直接指定Hive安装目录 |
–hive-import | 使用默认分隔符导入Hive |
–hive-overwrite | 覆盖掉在Hive表中已经存在的数据 |
–create-hive-table | 生成与关系数据库表的表结构对应的Hive表。如果表不存在,则创建,如果存在,报错 |
–hive-table < table-name > | 导入到Hive指定的表,可以创建新表 |
–fields-terminated-by | 指定分隔符(Hive默认的分隔符是/u0001) |
–lines-terminated-by | 设定每条记录行之间的分隔符,默认是换行,但也可以设定自己所需要的字符串 |
–delete-target-dir | 每次运行导入命令前,若有就先删除target-dir指定的目录 |
- 把mysql中数据库为mysql中的表help_keyword导入到hive中,指定的列分隔符和行分隔符
[yao@master ~]$ sqoop import --connect jdbc:mysql://master:3306/mysql --username hadoop --password hadoop --table help_keyword --hive-import -m 1 --create-hive-table --hive-table yr.new_help_keyword --fields-terminated-by '\t' --lines-terminated-by '\n'
hive> use yr;
OK
Time taken: 0.092 seconds
hive> show tables;
OK
new_help_keyword
stu
stu2
stu_buck
stu_mess
stu_mess_part
stu_mess_part_buck
stu_message
student
student2
student3
student5
student_buck
students
Time taken: 0.413 seconds, Fetched: 14 row(s)
hive> select * from new_help_keyword limit 5;
OK
0 (JSON
1 ->
2 ->>
3 <>
4 ACCOUNT
Time taken: 0.727 seconds, Fetched: 5 row(s)
//在默认目录下
[yao@master ~]$ hadoop fs -ls /user/hive/warehouse/yr.db/new_help_keyword
Found 1 items
-rwxr-xr-x 1 yao supergroup 9748 2019-03-26 17:08 /user/hive/warehouse/yr.db/new_help_keyword/part-m-00000
[yao@master ~]$ hadoop fs -cat /user/hive/warehouse/yr.db/new_help_keyword/part-m-00000|head -10
0 (JSON
1 ->
2 ->>
3 <>
4 ACCOUNT
5 ACTION
6 ADD
7 AES_DECRYPT
8 AES_ENCRYPT
9 AFTER
export
- 参数
参数内容 | 含义 |
---|---|
–direct | 使用直接导出模式(优化速度) |
–export-dir < dir > | HDFS 导出数据的目录 |
-m,–num-mappers < n > | 使用n个 map 任务并行导出 |
–table < table-name > | 导出的目的表名称 |
–update-key < col-name > | 更新参考的列名称,多个列名使用逗号分隔 |
–update-mode < mode > | 插入模式,默认是只更新(updateonly),可以设置为 allowinsert |
–input-null-string < null-string > | 使用指定字符串,替换字符串类型值为 null 的列 |
–input-null-non-string < null-string > | 使用指定字符串,替换非字符串类型值为 null 的列 |
–staging-table < staging-table-name > | 临时表,在数据导出到数据库之前,数据临时存放的表名称 |
–clear-staging-table | 清空临时表 |
–batch | 使用批量模式 |
从HDFS导出到MySQL
1、在mysql中准备环境(数据库,表)
mysql> create database sqoopdb default character set utf8 COLLATE utf8_general_ci;
Query OK, 1 row affected (0.10 sec)
mysql> use sqoopdb;
Database changed
mysql> create table sqoopstudents(id int,name varchar(20),age int);
Query OK, 0 rows affected (0.04 sec)
2、明确数据在HDFS上的位置
3、执行导出
[yao@master ~]$ sqoop export --connect jdbc:mysql://master:3306/sqoopdb --username hadoop --password hadoop --table sqoopstudents --export-dir '/data/student/000000_0' --fields-terminated-by '\t'
//登录到 mysql查看
mysql> use sqoopdb;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> show tables;
+-------------------+
| Tables_in_sqoopdb |
+-------------------+
| sqoopstudents |
+-------------------+
1 row in set (0.00 sec)
mysql> select * from sqoopstudents;
+------+-------+------+
| id | name | age |
+------+-------+------+
| 1007 | leo | 22 |
| 1005 | bob | 24 |
| 1006 | mark | 23 |
| 1014 | JAMES | 24 |
| 1011 | JACK | 23 |
+------+-------+------+
5 rows in set (0.00 sec)
从Hive导出到MySQL
//在mysql中准备数据库和表
mysql> use sqoopdb;
Database changed
mysql> create table sqoopstudent(id int,name varchar(20));
Query OK, 0 rows affected (0.07 sec)
//明确数据在Hive上的位置
[yao@master ~]$ hadoop fs -ls /user/hive/warehouse/yr_test.db/studenta
Found 1 items
-rwxr-xr-x 1 yao supergroup 90 2019-03-22 10:42 /user/hive/warehouse/yr_test.db/studenta/studenta.txt
//执行导出
[yao@master ~]$ sqoop export --connect jdbc:mysql://master:3306/sqoopdb --username hadoop --password hadoop --table sqoopstudent --export-dir '/user/hive/warehouse/yr_test.db/studenta/studenta.txt' --fields-terminated-by '\t' --input-fields-terminated-by '\t'
//登录到 mysql查看
mysql> use sqoopdb;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> show tables;
+-------------------+
| Tables_in_sqoopdb |
+-------------------+
| sqoopstudent |
| sqoopstudents |
+-------------------+
2 rows in set (0.00 sec)
mysql> select * from sqoopstudent;
+-------+-------+
| id | name |
+-------+-------+
| 10001 | shiny |
| 10002 | mark |
| 10009 | jack |
| 10014 | eva |
| 10003 | angel |
| 10005 | ella |
| 10018 | judy |
| 10020 | cendy |
+-------+-------+
8 rows in set (0.00 sec)
注意,在导出的操作中,创建的表必须和导出的表的数据类型相同,不然就会导出失败