1、spark和hbase交互找不到org.apache.htrace包:Caused by: java.lang.ClassNotFoundException: org.apache.htrace.Trace
java.lang.reflect.InvocationTargetException
Caused by: java.lang.NoClassDefFoundError: org/apache/htrace/Trace
Caused by: java.lang.ClassNotFoundException: org.apache.htrace.Trace
原因:CDH的Spark的classpath中引入的是/opt/cloudera/parcels/CDH/jars/htrace-core-3.0.4.jar,而这个版本的时候,htrace还是org.htrace,而3.1.0的时候已经贡献给Apache了,改叫org.apache.htrace了。
解决方法:手动修改作业提交机器的/etc/alternatives/spark-conf/classpath.txt,在最后添上/opt/cloudera/parcels/CDH-5.4.8-1.cdh5.4.8.p0.4/jars/htrace-core-3.1.0-incubating.jar即可。
2、本地调用java api写数据到hdfs,能创建文件,但文件内容为空,文件大小为0
java.io.IOException: File /testing/file01.txt could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1322)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2170)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:471)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
- 问题的关键是以下错误信息:在此操作中有1个datanode运行,1个节点被排除,这意味着您的hdfs-client无法使用50010端口连接到您的数据库。当您连接到hdfs namenode时,您可以获得datanode的状态.但是,您的hdfs客户端将无法连接到您的数据库(在hdfs中,一个namenode管理文件目录和数据库,如果hdfs-client连接到一个namnenode,它将找到一个目标文件路径和datanode地址,该数据包含数据,然后hdfs-client将与datanode进行通信(你可以使用netstat检查那些datanode uri,因为hdfs-client将尝试使用由namenode通知的地址与datanodes进行通信)。
解决问题:
- 在防火墙中打开50010端口
- 在java代码的config中添加配置:“dfs.client.use.datanode.hostname”,“true”
- 在我的客户端PC中将hostname添加到hostfile
三、CDH 5.7.0 安装问题
1、Unable to create the pidfile.
解决:在安装目录下如:/opt/cloudera-manager/cm-5.4.3/run,创建cloudera-scm-agent文件夹
[root@node1 run]# pwd
/opt/cloudera-manager/cm-5.4.3/run
[root@node1 run]# ls
cloudera-scm-agent cloudera-scm-agent.pid cloudera-scm-server cloudera-scm-server.pid
[root@node1 run]#
2:MainThread agent ERROR Caught unexpected exception in main loop.
ValueError: too many values to unpack
[21/Dec/2019 15:58:12 +0000] 31816 MainThread agent ERROR Caught unexpected exception in main loop.
Traceback (most recent call last):
File "/opt/cloudera-manager/cm-5.7.0/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.7.0-py2.7.egg/cmf/agent.py", line 673, in start
self._init_after_first_heartbeat_response(heartbeat_response["data"])
File "/opt/cloudera-manager/cm-5.7.0/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.7.0-py2.7.egg/cmf/agent.py", line 803, in _init_after_first_heartbeat_response
self.client_configs.load()
File "/opt/cloudera-manager/cm-5.7.0/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.7.0-py2.7.egg/cmf/client_configs.py", line 682, in load
new_deployed.update(self._lookup_alternatives(fname))
File "/opt/cloudera-manager/cm-5.7.0/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.7.0-py2.7.egg/cmf/client_configs.py", line 432, in _lookup_alternatives
return self._parse_alternatives(alt_name, out)
File "/opt/cloudera-manager/cm-5.7.0/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.7.0-py2.7.egg/cmf/client_configs.py", line 444, in _parse_alternatives
path, _, _, priority_str = line.rstrip().split(" ")
ValueError: too many values to unpack
解决:修改/opt/cloudera-manager/cm-5.7.0/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.7.0-py2.7.egg/cmf/client_configs.py文件:
源文件:
442 for line in output.splitlines():
443 if line.startswith("/"):
444 path, _, _, priority_str = line.rstrip().split(" ")
445
446 # Ignore the alternative if it's not managed by CM.
447 if CM_MAGIC_PREFIX not in os.path.basename(path):
448 continue
449
450 try:
451 priority = int(priority_str)
452 except ValueError:
453 THROTTLED_LOG.info("Failed to parse %s: %s", name, line)
454
455 key = ClientConfigKey(name, path)
456 value = ClientConfigValue(priority, self._read_generation(path))
457 ret[key] = value
458
459 return ret
修改成:
442 for line in output.splitlines():
443 if line.startswith("/"):
444 if len(line.rstrip().split(" "))<=4:
445 path, _, _, priority_str = line.rstrip().split(" ")
446
447 # Ignore the alternative if it's not managed by CM.
448 if CM_MAGIC_PREFIX not in os.path.basename(path):
449 continue
450
451 try:
452 priority = int(priority_str)
453 except ValueError:
454 THROTTLED_LOG.info("Failed to parse %s: %s", name, line)
455
456 key = ClientConfigKey(name, path)
457 value = ClientConfigValue(priority, self._read_generation(path))
458 ret[key] = value
459 else:
460 pass
461 return ret