【问题解决】本地提交任务到Spark集群报错:Initial job has not accepted any resources

原创 2018年04月17日 20:01:29

本地提交任务到Spark集群报错:Initial job has not accepted any resources

错误信息如下:

18/04/17 18:18:14 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
18/04/17 18:18:29 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources


将该python文件放到集群机器上提交到spark就没有问题。后来尝试在本机执行Spark自带的example,问题依旧存在。

虽然是WARN,但是任务并未成功执行,在Spark的webui里也一直是运行状态。我在本机和集群上执行的命令分别如下:

bin\spark-submit --master spark://192.168.3.207:7077 examples\src\main\python\pi.py
./spark-submit --master spark://192.168.3.207:7077 ../examples/src/main/python/pi.py
执行的都是spark自带的例子。
从网上找的解决办法大概有2个,都不好使,先在此记录一下:

1)加大执行内存:

bin\spark-submit --driver-memory 2000M --executor-memory 2000M --master spark://192.168.3.207:7077 examples\src\main\python\pi.py

2)修改防火墙或放开对spark的限制,或者暂时先关闭。


继续查看master和slave各自的log,也没有错误,后来到master的webui界面:http://192.168.3.207:8080/,点击刚才的任务进去:


点击某个workder的stderr,内容如下:

18/04/17 18:55:54 INFO CoarseGrainedExecutorBackend: Started daemon with process name: 23412@he-200
18/04/17 18:55:54 INFO SignalUtils: Registered signal handler for TERM
18/04/17 18:55:54 INFO SignalUtils: Registered signal handler for HUP
18/04/17 18:55:54 INFO SignalUtils: Registered signal handler for INT
18/04/17 18:55:55 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/04/17 18:55:55 INFO SecurityManager: Changing view acls to: he,shaowei.liu
18/04/17 18:55:55 INFO SecurityManager: Changing modify acls to: he,shaowei.liu
18/04/17 18:55:55 INFO SecurityManager: Changing view acls groups to: 
18/04/17 18:55:55 INFO SecurityManager: Changing modify acls groups to: 
18/04/17 18:55:55 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(he, shaowei.liu); groups with view permissions: Set(); users  with modify permissions: Set(he, shaowei.liu); groups with modify permissions: Set()
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1713)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:284)
at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply from 192.168.56.1:51378 in 120 seconds. This timeout is controlled by spark.rpc.askTimeout
at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)
...
Caused by: java.util.concurrent.TimeoutException: Cannot receive any reply from 192.168.56.1:51378 in 120 seconds
... 8 more

18/04/17 18:57:55 ERROR RpcOutboxMessage: Ask timeout before connecting successfully


发现日志报连接192.168.56.1:51378超时。问题是这个ip是哪里来的呢?查看下自己机器ip,命令行执行ipconfig,问题找到了:192.168.56.1是我本机Docker创建的VirtualBox虚拟网络IP。应该是本地在提交任务到集群时,没有正确获取到本机的ip地址,导致集群节点接受任务一直超时。解决办法很简单:把该网络禁用。
再试一次,很快就执行完毕了。
bin\spark-submit --master spark://192.168.3.207:7077 examples\src\main\python\pi.py

再看下webui里的日志,发现集群节点要连接我本机,然后将我的任务pi.py,传到节点临时目录/tmp/spark-xxx/,并拷贝到$SPARM_HOME/work/下才真正执行。以后有时间再学习下具体流程。顺便把日志贴出来:

18/04/17 19:13:11 INFO TransportClientFactory: Successfully created connection to /192.168.0.138:51843 after 3 ms (0 ms spent in bootstraps)
18/04/17 19:13:11 INFO DiskBlockManager: Created local directory at /tmp/spark-67d75b11-65e7-4bc7-89b5-c07fb159470f/executor-b8ce41a3-7c6e-49f6-95ef-7ed6cdef8e53/blockmgr-030eb78d-e46b-4feb-b7b7-108f9e61ec85
18/04/17 19:13:11 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
18/04/17 19:13:12 INFO CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler@192.168.0.138:51843
18/04/17 19:13:12 INFO WorkerWatcher: Connecting to worker spark://Worker@192.168.3.102:34041
18/04/17 19:13:12 INFO TransportClientFactory: Successfully created connection to /192.168.3.102:34041 after 0 ms (0 ms spent in bootstraps)
18/04/17 19:13:12 INFO WorkerWatcher: Successfully connected to spark://Worker@192.168.3.102:34041
18/04/17 19:13:12 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(1, 192.168.3.102, 44683, None)
18/04/17 19:13:12 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(1, 192.168.3.102, 44683, None)
18/04/17 19:13:12 INFO BlockManager: Initialized BlockManager: BlockManagerId(1, 192.168.3.102, 44683, None)
18/04/17 19:13:14 INFO CoarseGrainedExecutorBackend: Got assigned task 0
18/04/17 19:13:14 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
18/04/17 19:13:14 INFO Executor: Fetching spark://192.168.0.138:51843/files/pi.py with timestamp 1523963609005
18/04/17 19:13:14 INFO TransportClientFactory: Successfully created connection to /192.168.0.138:51843 after 1 ms (0 ms spent in bootstraps)
18/04/17 19:13:14 INFO Utils: Fetching spark://192.168.0.138:51843/files/pi.py to /tmp/spark-67d75b11-65e7-4bc7-89b5-c07fb159470f/executor-b8ce41a3-7c6e-49f6-95ef-7ed6cdef8e53/spark-98745f3b-2f70-47b2-8c56-c5b9f6eac496/fetchFileTemp2255624304256249008.tmp
18/04/17 19:13:14 INFO Utils: Copying /tmp/spark-67d75b11-65e7-4bc7-89b5-c07fb159470f/executor-b8ce41a3-7c6e-49f6-95ef-7ed6cdef8e53/spark-98745f3b-2f70-47b2-8c56-c5b9f6eac496/-11088979641523963609005_cache to /home/ubutnu/spark_2_2_1/work/app-20180417191311-0005/1/./pi.py
……
18/04/17 19:13:14 INFO TransportClientFactory: Successfully created connection to /192.168.0.138:51866 after 5 ms (0 ms spent in bootstraps)
18/04/17 19:13:14 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1803 bytes result sent to driver
……
18/04/17 19:13:16 INFO CoarseGrainedExecutorBackend: Driver commanded a shutdown
18/04/17 19:13:16 INFO MemoryStore: MemoryStore cleared
18/04/17 19:13:16 INFO ShutdownHookManager: Shutdown hook called
18/04/17 19:13:16 INFO ShutdownHookManager: Deleting directory /tmp/spark-67d75b11-65e7-4bc7-89b5-c07fb159470f/executor-b8ce41a3-7c6e-49f6-95ef-7ed6cdef8e53/spark-98745f3b-2f70-47b2-8c56-c5b9f6eac496

WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster uito ensure

当运行Spark程序,出现这样的问题WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your clu...
  • Glad_Xiao
  • Glad_Xiao
  • 2015-10-20 15:54:33
  • 2168

Spark执行样例报警告:WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources

搭建Spark环境后,调测Spark样例时,出现下面的错误: WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any re...
  • jiangshouzhuang
  • jiangshouzhuang
  • 2015-10-01 09:43:47
  • 6209

initial job has not accepted any resources的spark错误解决办法

在运行多个spark应用程序的时候,经常会出现initial job has not accepted any resources的错误。 而如果用spark-submit方式提交的话,经常...
  • sparkexpert
  • sparkexpert
  • 2016-03-04 16:57:35
  • 1930

Spark错误:WARN TaskSchedulerImpl: Initial job has not accepted any resources;

在windows环境下使用Intellij idea远程执行spark程序时,遇到了以下问题: Intellij控制台输出警告:WARN TaskSchedulerImpl: Initial jo...
  • Camu7s
  • Camu7s
  • 2015-05-13 16:59:10
  • 1739

Spark WARN cluster.ClusterScheduler: Initial job has not accepted any resources;check your cluster

WARN cluster.ClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ...
  • caimo
  • caimo
  • 2014-04-13 19:42:31
  • 22250

spark-submit 报错 Initial job has not accepted any resources

spark-submit  报这样的错误 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; c...
  • qyl445
  • qyl445
  • 2016-02-18 11:11:43
  • 6176

Spark错误异常-资源占用,任务挂起

运行在standalone模式下,提交的app应用一直在被挂起无法运行 15/04/08 02:06:09 WARN TaskSchedulerImpl: Initial job has not ac...
  • u011098327
  • u011098327
  • 2016-12-30 11:57:19
  • 384

运行spark问题:Initial job has not accepted any resources; check your cluster UI to ensure that workers a

运行  spark-submit --master spark://master:7077 --executor-memory 3000g --py-files SparkUtil.py Spark_...
  • coffeebreak
  • coffeebreak
  • 2017-08-30 22:23:25
  • 427

HiBench运行spark用例报Initial job has not accepted any resources错误

最近使用HiBench基准测试工具测试Spark性能。 但每当启动测试过程后不救,HiBench就提示错误: WARN scheduler.TaskSchedulerImpl: Initial j...
  • wifeisboss
  • wifeisboss
  • 2016-07-01 16:15:30
  • 1328

WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure

执行: ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://192.168.0.63:7077...
  • linux_ja
  • linux_ja
  • 2014-10-13 10:47:01
  • 4028
收藏助手
不良信息举报
您举报文章:【问题解决】本地提交任务到Spark集群报错:Initial job has not accepted any resources
举报原因:
原因补充:

(最多只允许输入30个字)