在web aip调用时,datanode出现500错误,客户端收到
curl -i -T ./install_log.txt -X PUT "http://10.244.2.132:9864/webhdfs/v1/aicore-checker/tmpqmuplstl?op=CREATE&user.name=root&namenoderpcaddress=mycluster&createflag=&createparent=true&overwrite=false"
HTTP/1.1 100 Continue
HTTP/1.1 400 Bad Request
Content-Type: application/json; charset=utf-8
Content-Length: 166
Connection: close
{"RemoteException":{"exception":"IllegalArgumentException","javaClassName":"java.lang.IllegalArgumentException","message":"java.net.UnknownHostException: mycluster"}}
注:hdfs dfs 上传文件正常,java使用rpc调用正常
解决方案
/etc/hadoop/hdfs-site.xml增加dfs.client.failover.proxy.provider.mycluster 配置
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
对于镜像已经固定的情况,我们需要把原镜像的/etc/hadoop复制出来,在外面修改后,用挂载目录的形式给镜像内部使用
containers:
volumeMounts:
- name: hadoop-etc
mountPath: /etc/hadoop
volumes:
- name: hadoop-etc
hostPath:
path: /home/aicore-1.7-update/datanode-etc