解决Spark运行中java.net.UnknownHostException: nameservice1的异常,nameservice1是配置hdfs中用到的HA,在执行spark程序时出现 java.net.UnknownHostException: nameservice1异常,找到正确的路径。
分析在CDH5.4中配置文件/etc/hadoop/conf/hdfs-site.xml,此文件记录了hdfs的路径与主节点的配置
hdfs-site.xml 主要配置如下:
- <property>
- <name>dfs.nameservices</name>
- <value>nameservice1</value>
- </property>
- <property>
- <name>dfs.client.failover.proxy.provider.nameservice1</name>
- <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
- </property>
- <property>
- <name>dfs.ha.automatic-failover.enabled.nameservice1</name>
- <value>true</value>
- </property>
- <property>
- <name>ha.zookeeper.quorum</name>
- <value>bdc40.hexun.com:2181,bdc41.hexun.com:2181,bdc46.hexun.com:2181,bdc53.hexun.com:2181,bdc54.hexun.com:2181</value>
- </property>
- <property>
- <name>dfs.ha.namenodes.nameservice1</name>
- <value>namenode50,namenode85</value>
- </property>
- <property>
- <name>dfs.namenode.rpc-address.nameservice1.namenode50</name>
- <value>bdc20.hexun.com:8020</value>
- </property>
- <property>
- <name>dfs.namenode.servicerpc-address.nameservice1.namenode50</name>
- <value>bdc20.hexun.com:8022</value>
- </property>
- <property>
- <name>dfs.namenode.http-address.nameservice1.namenode50</name>
- <value>bdc20.hexun.com:50070</value>
- </property>
- <property>
- <name>dfs.namenode.https-address.nameservice1.namenode50</name>
- <value>bdc20.hexun.com:50470</value>
- </property>
- <property>
- <name>dfs.namenode.rpc-address.nameservice1.namenode85</name>
- <value>bdc220.hexun.com:8020</value>
- </property>
- <property>
- <name>dfs.namenode.servicerpc-address.nameservice1.namenode85</name>
- <value>bdc220.hexun.com:8022</value>
- </property>
- <property>
- <name>dfs.namenode.http-address.nameservice1.namenode85</name>
- <value>bdc220.hexun.com:50070</value>
- </property>
- <property>
- <name>dfs.namenode.https-address.nameservice1.namenode85</name>
- <value>bdc220.hexun.com:50470</value>
- </property>
通过zookeeper来选举出当前的namenode的active.
hdfs-site.xml文件复制,到$SPARK_HOME/conf 目录下,即可解决spark job 访问hdfs路径的问题。