sqoop数据采集

什么是sqoop
sqoop的全称是SQL-To-Hadoop,它是一个关系型数据库和Hadoop之间数据交换的工具,它从sql导入到hadoop的过程叫做import,从hadoop导出sql的过程叫做export。import和export都是基于mapreduce的
安装Sqoop
sqoop的安装非常节点,只需要把它的包解压后,再配置一下环境变量就可以了
如果采集Oracle中的数据,需要把Oracle安装包中lib的jar放入sqoop的lib目录中

我的虚拟机中oracle的jdbc包ojdbc14.ja放入sqoop的lib中
操作sqoop
我们使用sqoop help命令查看一下sqoop的操作命令有哪些
Warning: /root/training/sqoop-1.4.5.bin__hadoop-0.23/bin/…/…/hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /root/training/sqoop-1.4.5.bin__hadoop-0.23/bin/…/…/hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /root/training/sqoop-1.4.5.bin__hadoop-0.23/bin/…/…/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /root/training/sqoop-1.4.5.bin__hadoop-0.23/bin/…/…/zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
2020-06-30 22:46:08,300 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5
usage: sqoop COMMAND [ARGS]
在开始的提示当中有几个警告,说是hbase不存在,Hbase的导入将来会失败,需要设置一下Hbase的目录
第二个警告是说HCatalog不存在,Hcatalog将来可能失败
其他导入可能会失败而已

Available commands:
codegen Generate code to interact with database records
create-hive-table Import a table definition into Hive
eval Evaluate a SQL statement and display the results
export Export an HDFS directory to a database table
help List available commands
import Import a table from a database to HDFS
import-all-tables Import tables from a database to HDFS
job Work with saved jobs
list-databases List available databases on a server
list-tables List available tables in a database
merge Merge results of incremental imports
metastore Run a standalone Sqoop metastore
version Display version information

以上的是sqoop中可以使用的命令,藏用的命令有
version Display version information
查看sqoop系统的版本
list-databases List available databases on a server
如果是Oracle:列出的数据库中所有的用户名称
如果是MySQL: 列出的所有的数据库名字
list-tables List available tables in a database
列出所有的额表
import Import a table from a database to HDFS
从数据库中导入一张表到Hdfs中
import-all-tables Import tables from a database to HDFS
从数据库中导入所有表到Hdfs上
codegen Generate code to interact with database records
create-hive-table Import a table definition into Hive
eval Evaluate a SQL statement and display the results
export Export an HDFS directory to a database table
help List available commands
在使用这些命令的时候,如果想查看这些命令还有那些参数可以使用help进行查询
比如 sqoop help list-databases
Common arguments:
–connect Specify JDBC connect
string
–connection-manager Specify connection manager
class name
–connection-param-file Specify connection
parameters file
–driver Manually specify JDBC
driver class to use
–hadoop-home Override
$HADOOP_MAPRED_HOME_ARG
–hadoop-mapred-home

Override
$HADOOP_MAPRED_HOME_ARG
–help Print usage instructions
-P Read password from console
–password Set authentication
password
–password-file Set authentication
password file path
–relaxed-isolation Use read-uncommitted
isolation for imports
–skip-dist-cache Skip copying jars to
distributed cache
–username Set authentication
username
–verbose Print more information
while working
示例(注意:操作Oracle,大写:用户名、表名):
sqoop list-databases --connect jdbc:oracle:thin:@192.168.112.130:1521:orcl
–username SYSTEM --password password
在执行这条命令的时候出现了这样的报错

Warning: /root/training/sqoop-1.4.5.bin__hadoop-0.23/bin/../../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /root/training/sqoop-1.4.5.bin__hadoop-0.23/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /root/training/sqoop-1.4.5.bin__hadoop-0.23/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /root/training/sqoop-1.4.5.bin__hadoop-0.23/bin/../../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
2020-06-30 23:28:40,880 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5
2020-06-30 23:28:40,927 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
2020-06-30 23:28:41,101 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
2020-06-30 23:28:41,131 INFO manager.SqlManager: Using default fetchSize of 1000
2020-06-30 23:28:41,415 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException
java.lang.NullPointerException
        at org.apache.sqoop.manager.OracleManager.listDatabases(OracleManager.java:695)
        at org.apache.sqoop.tool.ListDatabasesTool.run(ListDatabasesTool.java:49)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
[root@bigdata111 lib]# sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException

在我查询这个ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException
java.lang.NullPointerException错误大多数人说是数据的jar没有导入,但是我的导入了,最后去测试数据库原来是没有启动

使用lsnrctl status
只运行了一个进程
在我使用超级管理用户,修改系统注册的时候老是报错(ORA-01034: ORACLE not available)不知道为什么
在这里插入图片描述
最后在网上看到在用超级用户进入后执行一下startup就可以了,然后我oracle系统就正常启动了
参考的文章就是这个
https://community.oracle.com/message/15081041
然后我们再执行那个命令就正常了
在这里插入图片描述
然后我在测试列出oracle下某个用户的表
列出某个用户下的表
sqoop list-tables --connect jdbc:oracle:thin:@192.168.112.130:1521:orcl
–username SCOTT --password tiger

eval
执行一条sql语句并展示它的结果

sqoop eval --connect jdbc:oracle:thin:@192.168.112.130:1521:orcl \
--username SCOTT --password tiger \
--query "select * from emp where deptno=10"

codegen
根据表的结构生成java的相关文件文件

[root@bigdata111 ~]# sqoop codegen --connect jdbc:oracle:thin:@192.168.112.130:1rcl --username SCOTT --password tiger --table EMP521:orcl --username SCOTT --password tiger --table EMP
Warning: /root/training/sqoop-1.4.5.bin__hadoop-0.23/bin/../../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /root/training/sqoop-1.4.5.bin__hadoop-0.23/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /root/training/sqoop-1.4.5.bin__hadoop-0.23/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /root/training/sqoop-1.4.5.bin__hadoop-0.23/bin/../../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
2020-07-03 00:52:29,688 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5
2020-07-03 00:52:29,754 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
2020-07-03 00:52:29,967 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
2020-07-03 00:52:30,002 INFO manager.SqlManager: Using default fetchSize of 1000
2020-07-03 00:52:30,002 INFO tool.CodeGenTool: Beginning code generation
2020-07-03 00:52:37,114 INFO manager.OracleManager: Time zone has been set to GMT
2020-07-03 00:52:37,243 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM EMP t WHERE 1=0
2020-07-03 00:52:38,217 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /root/training/hadoop-3.1.2
Note: /tmp/sqoop-root/compile/4b473e773e258d71210d1b7d65e64f72/EMP.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
2020-07-03 00:52:46,994 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/4b473e773e258d71210d1b7d65e64f72/EMP.jar
[root@bigdata111 ~]# ls /tmp/sqoop-root/compile/4b473e773e258d71210d1b7d65e64f72/
EMP.class  EMP.jar

我可以从打印的日志上可以看到生成到linux的tmp目录下,并且在执行这条命令的目录上生成了一个java文件

create-hive-table
根据关系型数据库表结构创建hive的表
sqoop create-hive-table --connect jdbc:oracle:thin:@192.168.112.130:1521:orcl
–username SCOTT --password tiger --table EMP --hive-table emphive
我们开启hive命令行模式,查看一下刚才的表
hive
show tables;
describe emphive;
import Import a table from a database to HDFS
导入一张表到hdfs中

sqoop import --connect jdbc:oracle:thin:@192.168.112.130:1521:orcl \
--username SCOTT --password tiger --table EMP --target-dir /sqoopemp

使用这张表我们已经导入到hdfs中
import-all-tables Import tables from a database to HDFS
采集某个用户下的所有表到hdfs中
sqoop import-all-tables --connect jdbc:oracle:thin:@192.168.112.130:1521:orcl --username SCOTT --password tiger -m 1
这里-m 1 表示使用表中的第一列作为主键,因为oracle中有一张表中没有主键
导入到了hdfs的user/root的目录下
在这里插入图片描述
export Export an HDFS directory to a database table
从hdfs导出数据到关系数据库中,但是值得注意的是在导入之前,这样表必须提前存在于关系型数据库中
sqoop export --connect jdbc:oracle:thin:@192.168.112.130:1521:orcl --username SCOTT --password tiger --table STUDENTS --export-dir /students

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

枣泥馅

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值