最近公司上云,所有大数据环境都迁移到云上。在进行项目迁移时候发现运行报错
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.NullPointerException
at org.apache.spark.SparkContext.<init>(SparkContext.scala:560)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
项目在生产环境没问题,到云环境就出问题了。
经过排查是云环境安装了两个版本spark /spark-2.4.8-bin-without-hadoop和/spark-2.4.8-bin-without-hadoop-2.12
tcp6 0 0 10.10.1.19:44358 10.10.1.101:3306 ESTABLISHED -
[hadoop@namenode-1:/data/dps-hadoop/soft/spark-2.4.8-bin-without-hadoop/conf] $ netstat -anp |grep 7077
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp6 0 0 10.10.1.19:7077 :::* LISTEN 20786/java
tcp6 0 0 10.10.1.19:7077 10.10.1.22:45244 ESTABLISHED 20786/java
tcp6 0 0 10.10.1.19:7077 10.10.1.20:58022 ESTABLISHED 20786/java
tcp6 0 0 10.10.1.19:7077 10.10.1.21:40242 ESTABLISHED 20786/java
经过排查发现7077的spark端口被2078监听
[hadoop@namenode-1:/data/dps-hadoop/soft/spark-2.4.8-bin-without-hadoop/conf] $ netstat -anp |grep 20786
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp6 0 0 :::8080 :::* LISTEN 20786/java
tcp6 0 0 10.10.1.19:7077 :::* LISTEN 20786/java
tcp6 0 0 10.10.1.19:7077 10.10.1.22:45244 ESTABLISHED 20786/java
tcp6 0 0 10.10.1.19:7077 10.10.1.20:58022 ESTABLISHED 20786/java
tcp6 0 0 10.10.1.19:7077 10.10.1.21:40242 ESTABLISHED 20786/java
再循着2078查看发现被两个进程监听
[hadoop@namenode-1:/data/dps-hadoop/soft/spark-2.4.8-bin-without-hadoop/conf] $ ps -ef|grep 20786
hadoop 1379 22597 0 17:20 pts/0 00:00:00 grep --color=auto 20786
hadoop 20786 1 0 Jun10 ? 00:07:00 /data/dps-hadoop/soft/jdk1.8.0_291/bin/java -cp /data/dps-hadoop/soft/spark-2.4.8-bin-without-hadoop-scala-2.12/conf/:/data/dps-hadoop/soft/spark-2.4.8-bin-without-hadoop-scala-2.12/jars/*:/data/dps-hadoop/hadoop-2.10.1/etc/hadoop/:/data/dps-hadoop/hadoop-2.10.1/share/hadoop/common/lib/*:/data/dps-hadoop/hadoop-2.10.1/share/hadoop/common/*:/data/dps-hadoop/hadoop-2.10.1/share/hadoop/hdfs/:/data/dps-hadoop/hadoop-2.10.1/share/hadoop/hdfs/lib/*:/data/dps-hadoop/hadoop-2.10.1/share/hadoop/hdfs/*:/data/dps-hadoop/hadoop-2.10.1/share/hadoop/yarn/:/data/dps-hadoop/hadoop-2.10.1/share/hadoop/yarn/lib/*:/data/dps-hadoop/hadoop-2.10.1/share/hadoop/yarn/*:/data/dps-hadoop/hadoop-2.10.1/share/hadoop/mapreduce/lib/*:/data/dps-hadoop/hadoop-2.10.1/share/hadoop/mapreduce/*:/data/dps-hadoop/hadoop-2.10.1/contrib/capacity-scheduler/*.jar -Xmx1g org.apache.spark.deploy.master.Master --host namenode-1 --port 7077 --webui-port 8080
再循着pid找到该进程,发现与预期的spark版本不一致,不需要2.12版本
[hadoop@namenode-1:/data/dps-hadoop/soft/spark-2.4.8-bin-without-hadoop/conf] $ jps
1536 RunJar
20786 Master
836 SparkSubmit
32469 Master
17462 HMaster
12729 ResourceManager
1546 Jps
12236 NameNode
12493 SecondaryNameNode
在通过jps查看发现有两个master,断定为dirver端有两个master导致程序出错