目录
1.SparkContext: Error initializing SparkContext
18/10/29 15:55:39 ERROR SparkContext: Error initializing SparkContext.
java.net.BindException: Cannot assign requested address: Service 'sparkDriver' failed after 16 retries! Consider explicitly setting the appropriate port for the service 'sparkDriver' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries.
客户端运行时错误,原因:主机名为修改,或修改后未重启
2.spark on yarn
问题出现:在yarn集群上运行spark任务
ERROR client.TransportClient: Failed to send RPC 6600979308376699964 to /192.168.56.103:56283: java.nio.channels.ClosedChannelException
java.nio.channels.ClosedChannelException
ERROR spark.SparkContext: Error initializing SparkContext.
java.lang.IllegalStateException: Spark context stopped while waiting for backend
ERROR cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Sending RequestExecutors(0,0,Map()) to AM was unsuccessful
java.io.IOException: Failed to send RPC 6600979308376699964 to /192.168.56.103:56283: java.nio.channels.ClosedChannelException
Caused by: java.nio.channels.ClosedChannelException
Exception in thread "main" java.lang.IllegalStateException: Spark context stopped while waiting for backend
ERROR util.Utils: Uncaught exception in thread Yarn application state monitor
org.apache.spark.SparkException: Exception thrown in awaitResult
Caused by: java.io.IOException: Failed to send RPC 6600979308376699964 to /192.168.56.103:56283: java.nio.channels.ClosedChannelException
Caused by: java.nio.channels.ClosedChannelException
这个问题是由于内存不足导致的。在yarn-site.xml中添加一下信息
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
注意:重启集群!!!
3.Spark集群启动失败
问题描述:/etc/profile文件中已经配置JAVA_HOME,并且已经source /etc/profile了。启动spark集群,所有的从节点都报以下错误
JAVA_HOME is not set
在spark-env.sh中添加JAVA_HOME配置信息
JAVA_HOME=/opt/java/jdk1.8.0_151
再重新启动spark集群即可。
4.Spark HA集群中两个master都是standby状态
问题描述:集群配置文件是从其他服务器上复制的,但是个人虚拟机上的主机名与服务器上的主机名不相同。启动spark HA集群后,发现两个master都是standby状态。通过查看master启动日志发现如下错误:
INFO zookeeper.ZooKeeper: Initiating client connection, connectString=node02:2181,node03:2181,node04:2181 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@3a7078dc
ERROR imps.CuratorFrameworkImpl: Background exception was not retry-able or retry gave up
java.net.UnknownHostException: node02: Name or service not known
ERROR curator.ConnectionState: Connection timed out for connection string (node02:2181,node03:2181,node04:2181) and timeout (15000) / elapsed (15326)
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
ERROR curator.ConnectionState: Connection timed out for connection string (node02:2181,node03:2181,node04:2181) and timeout (15000) / elapsed (35375)
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
ERROR netty.Inbox: Ignoring error
java.net.UnknownHostException: node02: Name or service not known
因为主机名不同,并且spark-env.sh配置文件中没有修改SPARK_DAEMON_JAVA_OPTS的配置信息, 所以spark HA集群依赖的zookeeper集群是一个错误的。将SPARK_DAEMON_JAVA_OPTS配置信息中的zookeeper url修改成自己的zookeeper集群地址就ok了