sqoop的导入和导出

最新推荐文章于 2024-03-04 09:11:01 发布

威威yw

最新推荐文章于 2024-03-04 09:11:01 发布

阅读量404

点赞数

文章标签： sqoop

本文链接：https://blog.csdn.net/weixin_44849511/article/details/90315799

版权

文章目录

import
从MySQL导入到HDFS
从MySQL导入到Hive
export
从HDFS导出到MySQL
从Hive导出到MySQL

import

sqoop的命令

[ww@master ~]$ sqoop help

Available commands:
  codegen            Generate code to interact with database records
  create-hive-table  Import a table definition into Hive
  eval               Evaluate a SQL statement and display the results
  export             Export an HDFS directory to a database table
  help               List available commands
  import             Import a table from a database to HDFS
  import-all-tables  Import tables from a database to HDFS
  import-mainframe   Import datasets from a mainframe server to HDFS
  job                Work with saved jobs
  list-databases     List available databases on a server
  list-tables        List available tables in a database
  merge              Merge results of incremental imports
  metastore          Run a standalone Sqoop metastore
  version            Display version information

从MySQL导入到HDFS

参数内容	含义
–connect < jdbc-uri >	JDBC 连接地址
–connection-manager < class-nam e>	指定使用的连接管理类
–driver < class-name >–	指定JDBC要使用的驱动类-
–help	打印用法帮助信息
-P	从控制台读取输入的密码
-m	复制过程中使用1个map作业，若是不写默认是4个map
–password < password>	密码
–username < username >	账号
–table < tablename >	MySQL表名
–fields-terminated-by < separator >	输出文件中的行的字段分隔符
–target-dir < dir >	指定HDFS路径
–where < condition >	指定导出时所使用的查询条件
–verbose	在控制台打印出详细运行信息
–verbose	在控制台打印出详细运行信息
–connection-param-file < filename >	可选，一个记录着数据库连接参数的文件

把mysql中数据库为mysql中的表help_keyword到入到hdfs中
导入默认路径

[ww@master ~]$ sqoop import --connect jdbc:mysql://master:3306/mysql --username hadoop --password hadoop --table help_keyword -m 1

//在不指定导入目录的情况下，默认的路径就是/user/ww/导入表的表命名的目录下

[ww@master ~]$ hadoop fs -cat /user/ww/help_keyword/part-m-00000
0,(JSON
1,->
2,->>
3,<>
4,ACCOUNT
5,ACTION
6,ADD
7,AES_DECRYPT
省略
689,WORK
690,WRAPPER
691,WRITE
692,X
693,X509
694,XA
695,Y
696,YEAR
697,YEAR_MONTH
698,

在导入的同时，添加导入路径以及分隔符

//把mysql中数据库为mysql中的表help_keyword到入到hdfs中，指定的目录是/mysqltohdfs/help_keyword，并且分隔符是"\t"
[ww@master ~]$ hadoop fs -cat /mysqltohdfs/help_keyword/part-m-00000|head -10
0       (JSON
1       ->
2       ->>
3       <>
4       ACCOUNT
5       ACTION
6       ADD
7       AES_DECRYPT
8       AES_ENCRYPT
9       AFTER

从MySQL导入到Hive

参数

参数内容	含义
–hive-home < dir >	直接指定Hive安装目录
–hive-import	使用默认分隔符导入Hive
–hive-overwrite	覆盖掉在Hive表中已经存在的数据
–create-hive-table	生成与关系数据库表的表结构对应的Hive表。如果表不存在，则创建，如果存在，报错
–hive-table < table-name >	导入到Hive指定的表，可以创建新表
–fields-terminated-by	指定分隔符(Hive默认的分隔符是/u0001)
–lines-terminated-by	设定每条记录行之间的分隔符，默认是换行，但也可以设定自己所需要的字符串
–delete-target-dir	每次运行导入命令前，若有就先删除target-dir指定的目录

把mysql中数据库为mysql中的表help_keyword导入到hive中，指定的列分隔符和行分隔符


[ww@ster ~]$ sqoop import --connect jdbc:mysql://master:3306/mysql --username hadoop --password hadoop --table help_keyword --hive-import -m 1 --create-hive-table --hive-table lx.new_help_keyword --fields-terminated-by '\t' --lines-terminated-by '\n' --delete-target-dir

hive> use yw;
OK
Time taken: 0.101 seconds
hive> show tables;
OK
new_help_keyword
stu
student
Time taken: 0.078 seconds, Fetched: 3 row(s)
查看new_help_keyword
hive> select * from new_help_keyword limit 5;
OK
0       (JSON
1       ->
2       ->>
3       <>
4       ACCOUNT
Time taken: 0.95 seconds, Fetched: 5 row(s)
//在默认目录下
[ww@master ~]$  hadoop fs -ls /user/hive/warehouse/yw.db/new_help_keyword
Found 1 items
-rwxr-xr-x   1 ww supergroup       9748 2019-05-18 11:28 /user/hive/warehouse/yw.db/new_help_keyword/part-m-00000
查看前十行
[ww@master ~]$ hadoop fs -cat /user/hive/warehouse/yw.db/new_help_keyword/part-m-00000 | head -10
0       (JSON
1       ->
2       ->>
3       <>
4       ACCOUNT
5       ACTION
6       ADD
7       AES_DECRYPT
8       AES_ENCRYPT
9       AFTER

export

参数

参数内容	含义
–direct	使用直接导出模式（优化速度）
–export-dir < dir >	HDFS 导出数据的目录
-m,–num-mappers < n >	使用n个 map 任务并行导出
–table < table-name >	导出的目的表名称
–update-key < col-name >	更新参考的列名称，多个列名使用逗号分隔
–update-mode < mode >	插入模式，默认是只更新（updateonly），可以设置为 allowinsert
–input-null-string < null-string >	使用指定字符串，替换字符串类型值为 null 的列
–input-null-non-string < null-string >	使用指定字符串，替换非字符串类型值为 null 的列
–staging-table < staging-table-name >	临时表，在数据导出到数据库之前，数据临时存放的表名称
–clear-staging-table	清空临时表
–batch	使用批量模式

从HDFS导出到MySQL

1、在HDFS中准备数据

[ww@master data]$ vi stus.txt
zhangsan        23      beijing
wangwu  26      shanghai
xiaoming        34      nanjing
上传到HDFS，/datd目录
[ww@master data]$ hadoop fs -put stus.txt /data
[ww@master ~]$ hadoop fs -ls /data
Found 1 items
-rw-r--r--   1 ww supergroup  59 2019-05-18 12:44 /data/stus.txt

2、在mysql中准备环境（数据库，表）

mysql> create database sqoopdb default character set utf8 COLLATE utf8_general_ci;
Query OK, 1 row affected (0.10 sec)
mysql> use sqoopdb;
Database changed
mysql> create table sqoopstudent(name varchar(20),age int,loction varchar(20)); 
Query OK, 0 rows affected (0.07 sec)

3、明确数据在HDFS上的位置

[ww@master ~]$ hadoop fs -ls /data
Found 1 items
-rw-r--r--   1 ww supergroup  59 2019-05-18 12:44 /data/stus.txt

4、执行导出

[ww@master data]$ sqoop export --connect jdbc:mysql://master:3306/sqoopdb --username hadoop --password hadoop --table sqoopstudent --export-dir '/data/stus.txt' --fields-terminated-by '\t'
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
19/05/18 12:57:39 INFO tool.CodeGenTool: Beginning code generation
Sat May 18 12:57:40 CST 2019 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
19/05/18 12:57:41 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `sqoopstudent` AS t LIMIT 1
19/05/18 12:59:18 INFO mapreduce.Job: Job job_1557843570377_0042 completed successfully
19/05/18 12:59:19 INFO mapreduce.Job: Counters: 30
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=561524
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=677
                HDFS: Number of bytes written=0
                HDFS: Number of read operations=19
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=0
        Job Counters 
                Launched map tasks=4
                Data-local map tasks=4
                Total time spent by all maps in occupied slots (ms)=230913
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=230913
                Total vcore-milliseconds taken by all map tasks=230913
                Total megabyte-milliseconds taken by all map tasks=236454912
        Map-Reduce Framework
                Map input records=3
                Map output records=3
                Input split bytes=501
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=1893
                CPU time spent (ms)=6560
                Physical memory (bytes) snapshot=373071872
                Virtual memory (bytes) snapshot=8336805888
                Total committed heap usage (bytes)=90992640
        File Input Format Counters 
                Bytes Read=0
        File Output Format Counters 
                Bytes Written=0
19/05/18 12:59:19 INFO mapreduce.ExportJobBase: Transferred 677 bytes in 93.4927 seconds (7.2412 bytes/sec)
19/05/18 12:59:19 INFO mapreduce.ExportJobBase: Exported 3 records.

登陆MySQL查看

mysql> use sqoopdb;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show tables;
+-------------------+
| Tables_in_sqoopdb |
+-------------------+
| sqoopstudent      |
| student           |
+-------------------+
2 rows in set (0.00 sec)

mysql> select * from sqoopstudent;
+----------+------+----------+
| name     | age  | loction  |
+----------+------+----------+
| wangwu   |   26 | shanghai |
| xiaoming |   34 | nanjing  |
| zhangsan |   23 | beijing  |
+----------+------+----------+
3 rows in set (0.00 sec)

从Hive导出到MySQL

//在mysql中准备数据库和表
mysql> use sqoopdb;
Database changed
mysql> create table sqoopstudent(id int,name varchar(20));
Query OK, 0 rows affected (0.07 sec)
//明确数据在Hive上的位置
[ww@master data]$ hadoop fs -ls /user/hive/warehouse/lx.db/studenta
Found 1 items
-rwxr-xr-x   1 ww supergroup         90 2019-05-15 11:05 /user/hive/warehouse/lx.db/studenta/studenta.txt

//执行导出
[ww@master data]$ sqoop export --connect jdbc:mysql://master:3306/sqoopdb --username hadoop --password hadoop --table sqoopstudent2 --export-dir '/user/hive/warehouse/lx.db/studenta/studenta.txt' --fields-terminated-by '\t' --input-fields-terminated-by '\t'
19/05/18 15:14:50 INFO mapreduce.ExportJobBase: Transferred 926 bytes in 77.1693 seconds (11.9996 bytes/sec)
19/05/18 15:14:50 INFO mapreduce.ExportJobBase: Exported 8 records.
//登录到 mysql查看
mysql> use sqoopdb;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> select * from sqoopstudent2;
+-------+-------+
| id    | name  |
+-------+-------+
| 10001 | shiny |
| 10020 | mark  |
| 13700 | angel |
| 10005 | ella  |
| 10009 | jack  |
| 10014 | eva   |
| 10018 | judy  |
| 10020 | cendy |
+-------+-------+
8 rows in set (0.00 sec)

威威yw

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
sqoop的导入和导出

importsqoop的命令[ww@master ~]$ sqoop helpAvailable commands: codegen Generate code to interact with database records create-hive-table Import a table definition into Hive eval ...
复制链接

扫一扫