Sqoop1.4.6使用数据导入导出

最新推荐文章于 2021-06-30 19:19:49 发布

IT独白者

最新推荐文章于 2021-06-30 19:19:49 发布

阅读量1.3k

点赞数

分类专栏： hadoop

本文链接：https://blog.csdn.net/sun_wangdong/article/details/75988323

版权

hadoop 专栏收录该内容

16 篇文章 0 订阅

订阅专栏

sqoop是一款用于结合关系型数据库和hdfs(hive/hbase)的数据库之间数据相互传递的工具，可以将mysql/oracle等数据库中存在的表格通过sqoop来传递给hive或着是hbase，同样也可以将hive或者是hbase中的表格传递给sqoop中，非常好用。

目前主要有两种版本的sqoop，分别是sqoop-1.4.*和sqoop-1.99.*。其中sqoop-1.4.*代表的是sqoop的1.*版本，而sqoop-1.99.*则代表的是2.*版本，但是这两个版本并不兼容，不能相互使用，那么笔者这里采用的是sqoop-1.4.6版本来安装的。首先是安装过程，http://sqoop.apache.org官网上可以下载具体的安装包。我这里是将它解压缩在/users/sunwangdong/hadoop的路径下，方便和hadoop的安装包在一起，然后进入到sqoop/conf的路径下，其中有一个文件sqoop-env-template.sh的文件，需要将其重命名成sqoop-env.sh，然后再修改里面的内容，主要是将关于hadoop_common_home的路径转化成hadoop的安装路径，将hbase的路径填好，将hive的路径填好，如果用到了zookeeper，那么同样需要将其路径填好，然后是进入到/etc/profile中，需要为sqoop添加环境变量，

export SQOOP_HOME=/users/sunwangdong/hadoop/sqoop-1.4.6

export PATH=$PATH:$SQOOP_HOME/bin

注意配置完环境变量后，需要用source /etc/profile来使其立即生效。同时因为需要让sqoop能够和myql的数据库互联，所以还需要将mysql-connector-java-bin.5.1.31.jar包放置到sqoop/lib的路径下，注意，一开始，我才用的是mysql-connector-java-bin.5.1.7.jar这个jar包，但是在将mysql中的数据表添加到hdfs中时，出现了

ERROR manager.SqlManager: Error reading from database: java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@2cbefcfd is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries.
java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@2cbefcfd is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries.

上述错误表明我放置的mysql的连接jar包的版本太低了，所以后来我重新去网上下了mysql-connector-java-bin.5.1.31.jar包后，就可以用了，上述包的下载路径为： http://download.csdn.net/detail/heixiazuoluo10254222/7706781

然后通过运行sqoop来查看是否sqoop运行成功，当然，首先还是需要运行hadoop。

./sqoop version

sunwangdongMacBook-Pro:bin sunwangdong$ ./sqoop version
Warning: /users/sunwangdong/hadoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
17/07/24 10:43:29 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
Sqoop 1.4.6
git commit id c0c5a81723759fa575844a0a1eae8f510fa32c25
Compiled by root on Mon Apr 27 14:38:36 CST 2015

表明成功安装了。

一、mysql中的表格添加到hdfs中

采用的命令是

sqoop import --connect jdbc://localhost:3306/hive --username hive --password hive --table TBLS --fields-terminated-by '\t'

上述语句是将hive储存在mysql中的一张系统表TBLS的内容复制到hdfs中，其中每一列用'\t'来分隔。注意：import表示的是从databases中将数据复制给hdfs中，这里的import/export都是站在databases的角度来看，也就是从数据库的角度出发，将hdfs中的数据复制到databases中，则相对于sqoop来说就是export了。

然后通过在hdfs中查看，发现已经将数据复制了，注意，在hdfs中，默认的路径为hdfs dfs -ls /user/sunwangdong/TBLS。那么得到的结果如下：

hdfs dfs -ls /user/sunwangdong/TBLS

sunwangdongMacBook-Pro:lib sunwangdong$ hdfs dfs -ls /user/sunwangdong/TBLS
17/07/24 10:26:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 5 items
-rw-r--r--   1 sunwangdong supergroup          0 2017-07-24 10:25 /user/sunwangdong/TBLS/_SUCCESS
-rw-r--r--   1 sunwangdong supergroup         65 2017-07-24 10:25 /user/sunwangdong/TBLS/part-m-00000
-rw-r--r--   1 sunwangdong supergroup        133 2017-07-24 10:25 /user/sunwangdong/TBLS/part-m-00001
-rw-r--r--   1 sunwangdong supergroup         68 2017-07-24 10:25 /user/sunwangdong/TBLS/part-m-00002
-rw-r--r--   1 sunwangdong supergroup        141 2017-07-24 10:25 /user/sunwangdong/TBLS/part-m-00003

会有四个文件，这是因为默认，sqoop就会将表格中的所有记录数的条数转化为每一个小文件名，这主要是mapreduce将其进行拆分的。当然我们可以在

sqoop import --connect jdbc://localhost:3306/hive --username hive --password hive --table TBLS --fields-terminated-by '\t' -m 1 --append

来只让mapreduce拆分成一个，也就是不用进行拆分，然后后面通过--append来表示每次修改文件都只要在后续的路径后面添加即可。运行上述命令，执行结果如下所示：

sunwangdongMacBook-Pro:lib sunwangdong$ hdfs dfs -ls /user/sunwangdong/TBLS
17/07/24 11:02:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
-rw-r--r--   1 sunwangdong supergroup        407 2017-07-24 11:01 /user/sunwangdong/TBLS/part-m-00000

二、将mysql中的表格导入到hive中

同理，也是采用命令行下来写命令的方式来实现：

sqoop import --connect jdbc://localhost:3306/hive --username hive --password hive --table TBLS --fields-terminated-by '\t' -m 1 --hive-import

通过上述指令就可以将mysql中的表格导入到hive中,但是这里好像不支持--append的命令，也就是追加的指令。然后在hive中，我们看到已经多了一张tbls的表格。

并且通过select来查看表格的内容：

hive> show tables;
OK
bucket_t
bucket_tmp
ext_t
hive_users
partition_t
student
tbls
Time taken: 0.038 seconds, Fetched: 7 row(s)
hive> select * from tbls;
OK
1	1500191036	1	0	sunwangdong	0	1	student	MANAGED_TABLE	null	null
6	1500256048	1	0	sunwangdong	0	6	ext_t	EXTERNAL_TABLE	null	null
7	1500257303	1	0	sunwangdong	0	7	partition_t	MANAGED_TABLE	null	null
11	1500259132	1	0	sunwangdong	0	11	bucket_t	MANAGED_TABLE	null	null
16	1500260570	1	0	sunwangdong	0	16	bucket_tmp	MANAGED_TABLE	null	null
21	1500604052	1	0	sunwangdong	0	21	hive_users	EXTERNAL_TABLE	null	null
Time taken: 0.461 seconds, Fetched: 6 row(s)
发现成功了。




三、将hdfs中的表格复制到mysql中
同理，只需要将import改成export即可。当然需要导出的表格在mysql中是已经建立好了的才可以。
sqoop export --connect jdbc:mysql://localhost:3306/hive --username hive --password hive --table student --fields-terminated-by '\t' --export-dir '/stu'
其中，student表是需要在mysql中已经提前建立好了的，后面的export-dir的路径则是hdfs中需要导出复制给mysql的表的路径。


四、将hive中的表格复制给mysql中
首先启动hive，然后需要首先在hive中创建需要导入的数据表格，注意是用load命令导入。然后继续通过命令来实现：
sqoop export --connect jdbc:mysql://localhost:3306/hive --username hive --password hive --table student --fields-terminated-by '\t' --export-dir '/user/sunwangdong/hive/warehouse/stu'
注意上述export-dir的路径是hive中的路径，此路径需要在提前在hive中创建好即可。其实，hive的数据也是保存在hdfs中，所以从这个角度来看，从这两个数据仓库中导出数据给mysql，其实是一样的道理。

至此，将sqoop进行了简单的介绍。

IT独白者

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Sqoop1.4.6使用数据导入导出

sqoop是一款用于结合关系型数据库和hdfs(hive/hbase)的数据库之间数据相互传递的工具，可以将mysql/oracle等数据库中存在的表格通过sqoop来传递给hive或着是hbase，同样也可以将hive或者是hbase中的表格传递给sqoop中，非常好用。目前主要有两种版本的sqoop，分别是sqoop-1.4.*和sqoop-1.99.*。其中sqoop-1.4.*代表的是
复制链接

扫一扫

专栏目录