sqoop使用过程中的排错

最新推荐文章于 2022-10-02 13:36:36 发布

weixin_34344677

最新推荐文章于 2022-10-02 13:36:36 发布

阅读量734

点赞数

文章标签：数据库大数据 java

原文链接：https://my.oschina.net/u/3862440/blog/2254866

版权

2019独角兽企业重金招聘Python工程师标准>>>

sqoop 安装完成之后，下面介绍基本操作。

sqoop list-databases \
--connect jdbc:mysql://localhost:3306 \
--username root \
--password Oracle123

sqoop list-tables \
--connect jdbc:mysql://localhost:3306/hive \
--username root \
--password Oracle123

[hadoop@hd1 conf]$ sqoop import \
> --connect jdbc:mysql://localhost:3306/sqoop \
> --username root \
> --password Oracle123 \
> --table sqoop1 \
> --delete-target-dir \
> -m 1

Exception in thread "main" java.lang.NoClassDefFoundError: org/json/JSONObject
        at org.apache.sqoop.util.SqoopJsonUtil.getJsonStringforMap(SqoopJsonUtil.java:42)
        at org.apache.sqoop.SqoopOptions.writeProperties(SqoopOptions.java:742)
        at org.apache.sqoop.mapreduce.JobBase.putSqoopOptionsToConfiguration(JobBase.java:369)
        at org.apache.sqoop.mapreduce.JobBase.createJob(JobBase.java:355)
        at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:249)
        at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:692)
        at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Caused by: java.lang.ClassNotFoundException: org.json.JSONObject
        at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

看样子缺少jar包，导入json.jar包重新导入。

40899430803_0003/libjars/parquet-hadoop-1.5.0-cdh5.7.0.jar could only be replicated to 0 nodes instead of minReplication (=1).  There are 3 datanode(s) running and no node(s) are excluded in this operation.
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1595)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3287)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:677)
        at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:213)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:485)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)

看错误提示不能满足MR 最低副本=1的要求，可以确定要么DN死了，要么NN不能正常跟DN通信。

一般解决方法有2中：

1：检查DN跟NN之间的防火墙设置，网络是否通畅，DN节点的DataNode是否启动

2：由于NN重新格式化过，所以有可能造成NN的版本号跟DN记录的不一致，这个时候需要重新format NN。

关闭NN于DN的防火墙之后重新执行。

430803_0004_000002. Got exception: org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container. 
This token is expired. current time is 1540918072803 found 1540908325730
Note: System times on machines may be out of sync. Check system time and time zones.
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
        at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)
        at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
        at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:123)
        at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:251)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
. Failing the application.
18/10/30 21:55:26 INFO mapreduce.Job: Counters: 0
18/10/30 21:55:26 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
18/10/30 21:55:26 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 211.1476 seconds (0 bytes/sec)
18/10/30 21:55:26 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
18/10/30 21:55:26 INFO mapreduce.ImportJobBase: Retrieved 0 records.
18/10/30 21:55:26 ERROR tool.ImportTool: Error during import: Import job failed!

看报错信息，提示NN跟DN之间的时间不一致，导致sync出错。

设置服务器时间：
date -s "2018-10-30 18:22:10" && hwclock --systohc
hwclock --show

修改之后重新执行：

Error: java.lang.RuntimeException: java.lang.RuntimeException: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
        at org.apache.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:167)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:749)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.RuntimeException: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.

感觉是驱动的问题，换了一个驱动还是报这个错误，我决定换一个MySQL版本（mysql-5.6.23）。

解压
编辑/etc/my.cnf 配置文件

[root@hd1 mysql]# vi /etc/my.cnf

wait_timeout=86400
[mysqld]
basedir=/home/mysql/mysql-5.6.23
datadir=/home/mysql/mysql-5.6.23/data
socket=/tmp/mysql.sock
log_error=/home/mysql/mysql-5.6.23/mysql.err
user=mysql

[mysql]
socket=/tmp/mysql.sock

初始化 scripts/mysql_install_db --user=mysql --basedir=/home/mysql/mysql-5.6.23 --datadir=/home/mysql/mysql-5.6.23/data
启动 ./mysql.server start
登录

[mysql@hd1 support-files]$ mysql 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.6.23 MySQL Community Server (GPL)

Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show tables;
ERROR 1046 (3D000): No database selected
mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| test               |
+--------------------+
4 rows in set (0.00 sec)

继续执行还是报错，百思不得其解，最后我把localhost改成具体IP地址

--connect jdbc:mysql://192.168.83.11:3306/sqoop 在执行尽然成功了。

18/10/31 13:22:16 INFO mapreduce.Job: Counters: 30
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=137094
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=87
                HDFS: Number of bytes written=4
                HDFS: Number of read operations=4
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters 
                Launched map tasks=1
                Other local map tasks=1
                Total time spent by all maps in occupied slots (ms)=5935
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=5935
                Total vcore-seconds taken by all map tasks=5935
                Total megabyte-seconds taken by all map tasks=6077440
        Map-Reduce Framework
                Map input records=1
                Map output records=1
                Input split bytes=87
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=139
                CPU time spent (ms)=990
                Physical memory (bytes) snapshot=107266048
                Virtual memory (bytes) snapshot=2714398720
                Total committed heap usage (bytes)=22544384
        File Input Format Counters 
                Bytes Read=0
        File Output Format Counters 
                Bytes Written=4
18/10/31 13:22:16 INFO mapreduce.ImportJobBase: Transferred 4 bytes in 24.3133 seconds (0.1645 bytes/sec)
18/10/31 13:22:16 INFO mapreduce.ImportJobBase: Retrieved 1 records.

为什么会这样？

在官网上看到如下帖子：

Connection Refused

You get a ConnectionRefused Exception when there is a machine at the address specified, but there is no program listening on the specific TCP port the client is using -and there is no firewall in the way silently dropping TCP connection requests. If you do not know what a TCP connection request is, please consult the specification.

Unless there is a configuration error at either end, a common cause for this is the Hadoop service isn't running.

This stack trace is very common when the cluster is being shut down -because at that point Hadoop services are being torn down across the cluster, which is visible to those services and applications which haven't been shut down themselves. Seeing this error message during cluster shutdown is not anything to worry about.

If the application or cluster is not working, and this message appears in the log, then it is more serious.

The exception text declares both the hostname and the port to which the connection failed. The port can be used to identify the service. For example, port 9000 is the HDFS port. Consult the Ambari port reference, and/or those of the supplier of your Hadoop management tools.

Check the hostname the client using is correct. If it's in a Hadoop configuration option: examine it carefully, try doing an ping by hand.
Check the IP address the client is trying to talk to for the hostname is correct.
Make sure the destination address in the exception isn't 0.0.0.0 -this means that you haven't actually configured the client with the real address for that service, and instead it is picking up the server-side property telling it to listen on every port for connections.
If the error message says the remote service is on "127.0.0.1" or "localhost" that means the configuration file is telling the client that the service is on the local server. If your client is trying to talk to a remote system, then your configuration is broken.
Check that there isn't an entry for your hostname mapped to 127.0.0.1 or 127.0.1.1 in /etc/hosts (Ubuntu is notorious for this).
Check the port the client is trying to talk to using matches that the server is offering a service on. The netstat command is useful there.
On the server, try a telnet localhost <port> to see if the port is open there.
On the client, try a telnet <server> <port> to see if the port is accessible remotely.
Try connecting to the server/port from a different machine, to see if it just the single client misbehaving.
If your client and the server are in different subdomains, it may be that the configuration of the service is only publishing the basic hostname, rather than the Fully Qualified Domain Name. The client in the different subdomain can be unintentionally attempt to bind to a host in the local subdomain —and failing.
If you are using a Hadoop-based product from a third party, -please use the support channels provided by the vendor.
Please do not file bug reports related to your problem, as they will be closed as Invalid

MySQL表结构同步到hive中：

sqoop create-hive-table --connect jdbc:mysql://hd1:3306/hive --table TBLS --username root --password Oracle123 --hive-table TBLS

18/10/31 17:16:02 ERROR hive.HiveConfig: Could not load org.apache.hadoop.hive.conf.HiveConf. Make sure HIVE_CONF_DIR is set correctly.
18/10/31 17:16:02 ERROR tool.CreateHiveTableTool: Encountered IOException running create table job: java.io.IOException: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf
        at org.apache.sqoop.hive.HiveConfig.getHiveConf(HiveConfig.java:50)
        at org.apache.sqoop.hive.HiveImport.getHiveArgs(HiveImport.java:392)
        at org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:379)
        at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:337)
        at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:241)
        at org.apache.sqoop.tool.CreateHiveTableTool.run(CreateHiveTableTool.java:58)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf
        at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:259)
        at org.apache.sqoop.hive.HiveConfig.getHiveConf(HiveConfig.java:44)
        ... 11 more

解决方法：添加HADOOP_CLASSPATH到用户环境变量中。

export HADOOP_CLASSPATH=$HIVE_HOME/lib/*

转载于:https://my.oschina.net/u/3862440/blog/2254866