什么是sqoop
sqoop的全称是SQL-To-Hadoop,它是一个关系型数据库和Hadoop之间数据交换的工具,它从sql导入到hadoop的过程叫做import,从hadoop导出sql的过程叫做export。import和export都是基于mapreduce的
安装Sqoop
sqoop的安装非常节点,只需要把它的包解压后,再配置一下环境变量就可以了
如果采集Oracle中的数据,需要把Oracle安装包中lib的jar放入sqoop的lib目录中
我的虚拟机中oracle的jdbc包ojdbc14.ja放入sqoop的lib中
操作sqoop
我们使用sqoop help命令查看一下sqoop的操作命令有哪些
Warning: /root/training/sqoop-1.4.5.bin__hadoop-0.23/bin/…/…/hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /root/training/sqoop-1.4.5.bin__hadoop-0.23/bin/…/…/hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /root/training/sqoop-1.4.5.bin__hadoop-0.23/bin/…/…/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /root/training/sqoop-1.4.5.bin__hadoop-0.23/bin/…/…/zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
2020-06-30 22:46:08,300 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5
usage: sqoop COMMAND [ARGS]
在开始的提示当中有几个警告,说是hbase不存在,Hbase的导入将来会失败,需要设置一下Hbase的目录
第二个警告是说HCatalog不存在,Hcatalog将来可能失败
其他导入可能会失败而已
Available commands:
codegen Generate code to interact with database records
create-hive-table Import a table definition into Hive
eval Evaluate a SQL statement and display the results
export Export an HDFS directory to a database table
help List available commands
import Import a table from a database to HDFS
import-all-tables Import tables from a database to HDFS
job Work with saved jobs
list-databases List available databases on a server
list-tables List available tables in a database
merge Merge results of incremental imports
metastore Run a standalone Sqoop metastore
version Display version information
以上的是sqoop中可以使用的命令,藏用的命令有
version Display version information
查看sqoop系统的版本
list-databases List available databases on a server
如果是Oracle:列出的数据库中所有的用户名称
如果是MySQL: 列出的所有的数据库名字
list-tables List available tables in a database
列出所有的额表
import Import a table from a database to HDFS
从数据库中导入一张表到Hdfs中
import-all-tables Import tables from a database to HDFS
从数据库中导入所有表到Hdfs上
codegen Generate code to interact with database records
create-hive-table Import a table definition into Hive
eval Evaluate a SQL statement and display the results
export Export an HDFS directory to a database table
help List available commands
在使用这些命令的时候,如果想查看这些命令还有那些参数可以使用help进行查询
比如 sqoop help list-databases
Common arguments:
–connect Specify JDBC connect
string
–connection-manager Specify connection manager
class name
–connection-param-file Specify connection
parameters file
–driver Manually specify JDBC
driver class to use
–hadoop-home Override
$HADOOP_MAPRED_HOME_ARG
–hadoop-mapred-home
$HADOOP_MAPRED_HOME_ARG
–help Print usage instructions
-P Read password from console
–password Set authentication
password
–password-file Set authentication
password file path
–relaxed-isolation Use read-uncommitted
isolation for imports
–skip-dist-cache Skip copying jars to
distributed cache
–username Set authentication
username
–verbose Print more information
while working
示例(注意:操作Oracle,大写:用户名、表名):
sqoop list-databases --connect jdbc:oracle:thin:@192.168.112.130:1521:orcl
–username SYSTEM --password password
在执行这条命令的时候出现了这样的报错
Warning: /root/training/sqoop-1.4.5.bin__hadoop-0.23/bin/../../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /root/training/sqoop-1.4.5.bin__hadoop-0.23/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /root/training/sqoop-1.4.5.bin__hadoop-0.23/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /root/training/sqoop-1.4.5.bin__hadoop-0.23/bin/../../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
2020-06-30 23:28:40,880 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5
2020-06-30 23:28:40,927 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
2020-06-30 23:28:41,101 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
2020-06-30 23:28:41,131 INFO manager.SqlManager: Using default fetchSize of 1000
2020-06-30 23:28:41,415 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException
java.lang.NullPointerException
at org.apache.sqoop.manager.OracleManager.listDatabases(OracleManager.java:695)
at org.apache.sqoop.tool.ListDatabasesTool.run(ListDatabasesTool.java:49)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
[root@bigdata111 lib]# sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException
在我查询这个ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException
java.lang.NullPointerException错误大多数人说是数据的jar没有导入,但是我的导入了,最后去测试数据库原来是没有启动
。
使用lsnrctl status
只运行了一个进程
在我使用超级管理用户,修改系统注册的时候老是报错(ORA-01034: ORACLE not available)不知道为什么
最后在网上看到在用超级用户进入后执行一下startup就可以了,然后我oracle系统就正常启动了
参考的文章就是这个
https://community.oracle.com/message/15081041
然后我们再执行那个命令就正常了
然后我在测试列出oracle下某个用户的表
列出某个用户下的表
sqoop list-tables --connect jdbc:oracle:thin:@192.168.112.130:1521:orcl
–username SCOTT --password tiger
eval
执行一条sql语句并展示它的结果
sqoop eval --connect jdbc:oracle:thin:@192.168.112.130:1521:orcl \
--username SCOTT --password tiger \
--query "select * from emp where deptno=10"
codegen
根据表的结构生成java的相关文件文件
[root@bigdata111 ~]# sqoop codegen --connect jdbc:oracle:thin:@192.168.112.130:1rcl --username SCOTT --password tiger --table EMP521:orcl --username SCOTT --password tiger --table EMP
Warning: /root/training/sqoop-1.4.5.bin__hadoop-0.23/bin/../../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /root/training/sqoop-1.4.5.bin__hadoop-0.23/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /root/training/sqoop-1.4.5.bin__hadoop-0.23/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /root/training/sqoop-1.4.5.bin__hadoop-0.23/bin/../../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
2020-07-03 00:52:29,688 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5
2020-07-03 00:52:29,754 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
2020-07-03 00:52:29,967 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
2020-07-03 00:52:30,002 INFO manager.SqlManager: Using default fetchSize of 1000
2020-07-03 00:52:30,002 INFO tool.CodeGenTool: Beginning code generation
2020-07-03 00:52:37,114 INFO manager.OracleManager: Time zone has been set to GMT
2020-07-03 00:52:37,243 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM EMP t WHERE 1=0
2020-07-03 00:52:38,217 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /root/training/hadoop-3.1.2
Note: /tmp/sqoop-root/compile/4b473e773e258d71210d1b7d65e64f72/EMP.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
2020-07-03 00:52:46,994 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/4b473e773e258d71210d1b7d65e64f72/EMP.jar
[root@bigdata111 ~]# ls /tmp/sqoop-root/compile/4b473e773e258d71210d1b7d65e64f72/
EMP.class EMP.jar
我可以从打印的日志上可以看到生成到linux的tmp目录下,并且在执行这条命令的目录上生成了一个java文件
create-hive-table
根据关系型数据库表结构创建hive的表
sqoop create-hive-table --connect jdbc:oracle:thin:@192.168.112.130:1521:orcl
–username SCOTT --password tiger --table EMP --hive-table emphive
我们开启hive命令行模式,查看一下刚才的表
hive
show tables;
describe emphive;
import Import a table from a database to HDFS
导入一张表到hdfs中
sqoop import --connect jdbc:oracle:thin:@192.168.112.130:1521:orcl \
--username SCOTT --password tiger --table EMP --target-dir /sqoopemp
使用这张表我们已经导入到hdfs中
import-all-tables Import tables from a database to HDFS
采集某个用户下的所有表到hdfs中
sqoop import-all-tables --connect jdbc:oracle:thin:@192.168.112.130:1521:orcl --username SCOTT --password tiger -m 1
这里-m 1 表示使用表中的第一列作为主键,因为oracle中有一张表中没有主键
导入到了hdfs的user/root的目录下
export Export an HDFS directory to a database table
从hdfs导出数据到关系数据库中,但是值得注意的是在导入之前,这样表必须提前存在于关系型数据库中
sqoop export --connect jdbc:oracle:thin:@192.168.112.130:1521:orcl --username SCOTT --password tiger --table STUDENTS --export-dir /students