sqoop的导入和导出

最新推荐文章于 2023-06-06 10:58:31 发布

shayuwei

最新推荐文章于 2023-06-06 10:58:31 发布

阅读量310

点赞数

分类专栏： Sqoop Hive 笔记文章标签： sqoop hive

本文链接：https://blog.csdn.net/shayuwei/article/details/88830677

版权

笔记同时被 3 个专栏收录

164 篇文章 5 订阅

订阅专栏

Hive

12 篇文章 0 订阅

订阅专栏

Sqoop

2 篇文章 0 订阅

订阅专栏

文章目录

import

sqoop的命令

[yao@master ~]$ sqoop help

Available commands:
  codegen            Generate code to interact with database records
  create-hive-table  Import a table definition into Hive
  eval               Evaluate a SQL statement and display the results
  export             Export an HDFS directory to a database table
  help               List available commands
  import             Import a table from a database to HDFS
  import-all-tables  Import tables from a database to HDFS
  import-mainframe   Import datasets from a mainframe server to HDFS
  job                Work with saved jobs
  list-databases     List available databases on a server
  list-tables        List available tables in a database
  merge              Merge results of incremental imports
  metastore          Run a standalone Sqoop metastore
  version            Display version information

从MySQL导入到HDFS

参数

参数内容	含义
–connect < jdbc-uri >	JDBC 连接地址
–connection-manager < class-nam e>	指定使用的连接管理类
–driver < class-name >	指定JDBC要使用的驱动类
–help	打印用法帮助信息
-P	从控制台读取输入的密码
-m	复制过程中使用1个map作业，若是不写默认是4个map
–password < password>	密码
–username < username >	账号
–table < tablename >	MySQL表名
–fields-terminated-by < separator >	输出文件中的行的字段分隔符
–target-dir < dir >	指定HDFS路径
–where < condition >	指定导出时所使用的查询条件
–verbose	在控制台打印出详细运行信息
–connection-param-file < filename >	可选，一个记录着数据库连接参数的文件

把mysql中数据库为mysql中的表help_keyword到入到hdfs中

导入默认路径

[yao@master ~]$ sqoop import --connect jdbc:mysql://master:3306/mysql --username hadoop --password hadoop --table help_keyword -m 1

//在不指定导入目录的情况下，默认的路径就是/user/lan/导入表的表命名的目录下
[yao@master ~]$ hadoop fs -ls /user/yao/help_keyword
Found 2 items
-rw-r--r--   1 yao supergroup          0 2019-03-26 16:40 /user/yao/help_keyword/_SUCCESS
-rw-r--r--   1 yao supergroup       9748 2019-03-26 16:40 /user/yao/help_keyword/part-m-00000
[yao@master ~]$ hadoop fs -cat /user/yao/help_keyword/part-m-00000
0,(JSON
1,->
2,->>
3,<>
4,ACCOUNT
5,ACTION
6,ADD
7,AES_DECRYPT
8,AES_ENCRYPT
9,AFTER
10,AGAINST
......

在导入的同时，添加导入路径以及分隔符

//把mysql中数据库为mysql中的表help_keyword到入到hdfs中，指定的目录是/mysqltohdfs/help_keyword，并且分隔符是"\t"
[yao@master ~]$ sqoop import --connect jdbc:mysql://master:3306/mysql --username hadoop --password hadoop --table help_keyword --target-dir '/mysqltohdfs/help_keyword' --fields-terminated-by '\t' -m 1
//在对应目录下查看
[yao@master ~]$ hadoop fs -cat /mysqltohdfs/help_keyword/part-m-00000|head -10
0	(JSON
1	->
2	->>
3	<>
4	ACCOUNT
5	ACTION
6	ADD
7	AES_DECRYPT
8	AES_ENCRYPT
9	AFTER

从MySQL导入到Hive

参数

参数内容	含义
–hive-home < dir >	直接指定Hive安装目录
–hive-import	使用默认分隔符导入Hive
–hive-overwrite	覆盖掉在Hive表中已经存在的数据
–create-hive-table	生成与关系数据库表的表结构对应的Hive表。如果表不存在，则创建，如果存在，报错
–hive-table < table-name >	导入到Hive指定的表，可以创建新表
–fields-terminated-by	指定分隔符(Hive默认的分隔符是/u0001)
–lines-terminated-by	设定每条记录行之间的分隔符，默认是换行，但也可以设定自己所需要的字符串
–delete-target-dir	每次运行导入命令前，若有就先删除target-dir指定的目录

把mysql中数据库为mysql中的表help_keyword导入到hive中，指定的列分隔符和行分隔符

[yao@master ~]$ sqoop import --connect jdbc:mysql://master:3306/mysql --username hadoop --password hadoop --table help_keyword --hive-import -m 1 --create-hive-table --hive-table yr.new_help_keyword --fields-terminated-by '\t' --lines-terminated-by '\n'

hive> use yr;
OK
Time taken: 0.092 seconds
hive> show tables;
OK
new_help_keyword
stu
stu2
stu_buck
stu_mess
stu_mess_part
stu_mess_part_buck
stu_message
student
student2
student3
student5
student_buck
students
Time taken: 0.413 seconds, Fetched: 14 row(s)
hive> select * from new_help_keyword limit 5;
OK
0	(JSON
1	->
2	->>
3	<>
4	ACCOUNT
Time taken: 0.727 seconds, Fetched: 5 row(s)
//在默认目录下
[yao@master ~]$ hadoop fs -ls /user/hive/warehouse/yr.db/new_help_keyword
Found 1 items
-rwxr-xr-x   1 yao supergroup       9748 2019-03-26 17:08 /user/hive/warehouse/yr.db/new_help_keyword/part-m-00000
[yao@master ~]$ hadoop fs -cat /user/hive/warehouse/yr.db/new_help_keyword/part-m-00000|head -10
0	(JSON
1	->
2	->>
3	<>
4	ACCOUNT
5	ACTION
6	ADD
7	AES_DECRYPT
8	AES_ENCRYPT
9	AFTER

export

参数

参数内容	含义
–direct	使用直接导出模式（优化速度）
–export-dir < dir >	HDFS 导出数据的目录
-m,–num-mappers < n >	使用n个 map 任务并行导出
–table < table-name >	导出的目的表名称
–update-key < col-name >	更新参考的列名称，多个列名使用逗号分隔
–update-mode < mode >	插入模式，默认是只更新（updateonly），可以设置为 allowinsert
–input-null-string < null-string >	使用指定字符串，替换字符串类型值为 null 的列
–input-null-non-string < null-string >	使用指定字符串，替换非字符串类型值为 null 的列
–staging-table < staging-table-name >	临时表，在数据导出到数据库之前，数据临时存放的表名称
–clear-staging-table	清空临时表
–batch	使用批量模式

从HDFS导出到MySQL

1、在mysql中准备环境（数据库，表）

mysql> create database sqoopdb default character set utf8 COLLATE utf8_general_ci;
Query OK, 1 row affected (0.10 sec)

mysql> use sqoopdb;
Database changed
mysql> create table sqoopstudents(id int,name varchar(20),age int);
Query OK, 0 rows affected (0.04 sec)

2、明确数据在HDFS上的位置
3、执行导出

[yao@master ~]$ sqoop export --connect jdbc:mysql://master:3306/sqoopdb --username hadoop --password hadoop --table sqoopstudents --export-dir '/data/student/000000_0' --fields-terminated-by '\t'

//登录到 mysql查看
mysql> use sqoopdb;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show tables;
+-------------------+
| Tables_in_sqoopdb |
+-------------------+
| sqoopstudents     |
+-------------------+
1 row in set (0.00 sec)

mysql> select * from sqoopstudents;
+------+-------+------+
| id   | name  | age  |
+------+-------+------+
| 1007 | leo   |   22 |
| 1005 | bob   |   24 |
| 1006 | mark  |   23 |
| 1014 | JAMES |   24 |
| 1011 | JACK  |   23 |
+------+-------+------+
5 rows in set (0.00 sec)

从Hive导出到MySQL

//在mysql中准备数据库和表
mysql> use sqoopdb;
Database changed
mysql> create table sqoopstudent(id int,name varchar(20));
Query OK, 0 rows affected (0.07 sec)
//明确数据在Hive上的位置
 [yao@master ~]$ hadoop fs -ls /user/hive/warehouse/yr_test.db/studenta
Found 1 items
-rwxr-xr-x   1 yao supergroup         90 2019-03-22 10:42 /user/hive/warehouse/yr_test.db/studenta/studenta.txt

//执行导出
[yao@master ~]$ sqoop export --connect jdbc:mysql://master:3306/sqoopdb --username hadoop --password hadoop --table sqoopstudent --export-dir '/user/hive/warehouse/yr_test.db/studenta/studenta.txt' --fields-terminated-by '\t' --input-fields-terminated-by '\t'

//登录到 mysql查看
mysql> use sqoopdb;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show tables;
+-------------------+
| Tables_in_sqoopdb |
+-------------------+
| sqoopstudent      |
| sqoopstudents     |
+-------------------+
2 rows in set (0.00 sec)

mysql> select * from sqoopstudent;
+-------+-------+
| id    | name  |
+-------+-------+
| 10001 | shiny |
| 10002 | mark  |
| 10009 | jack  |
| 10014 | eva   |
| 10003 | angel |
| 10005 | ella  |
| 10018 | judy  |
| 10020 | cendy |
+-------+-------+
8 rows in set (0.00 sec)

注意，在导出的操作中，创建的表必须和导出的表的数据类型相同，不然就会导出失败

shayuwei

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
sqoop的导入和导出

文章目录import从MySQL导入到HDFS从MySQL导入到Hiveexport从HDFS导出到MySQL从Hive导出到MySQLimportsqoop的命令[yao@master ~]$ sqoop helpAvailable commands: codegen Generate code to interact with database reco...
复制链接

扫一扫