初次搭建spark standalone集群,进入spark shell后报错如下:
$spark-shell --master spark://b1:7077
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
19/11/15 16:28:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/11/15 16:28:31 ERROR cluster.StandaloneSchedulerBackend: Application has been killed. Reason: Master removed our application: FAILED
19/11/15 16:28:31 ERROR netty.Inbox: Ignoring error
org.apache.spark.SparkException: Exiting due to error from cluster scheduler: Master removed our application: FAILED
at org.apache.spark.scheduler.TaskSchedulerImpl.error(TaskSchedulerImpl.scala:459)
at org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend.dead(StandaloneSchedulerBackend.scala:139)
at org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint.markDead(StandaloneAppClient.scala:254)
at org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint$$anonfun$receive$1.applyOrElse(StandaloneAppClient.scala:168)
at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
19/11/15 16:28:31 ERROR spark.SparkContext: Error initializing SparkContext.
java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext.
.....省略一大堆.....
查看master日志:看不出来啥
19/11/15 16:28:29 INFO master.Master: Launching executor app-20191115162829-0000/9 on worker worker-20191115162814-192.168.1.32-46404
19/11/15 16:28:29 INFO master.Master: Removing executor app-20191115162829-0000/7 because it is FAILED
19/11/15 16:28:29 INFO master.Master: Launching executor app-20191115162829-0000/10 on worker worker-20191115162814-192.168.1.35-43891
19/11/15 16:28:29 INFO master.Master: Removing executor app-20191115162829-0000/8 because it is FAILED
19/11/15 16:28:29 INFO master.Master: Launching executor app-20191115162829-0000/11 on worker worker-20191115162815-192.168.1.34-44277
19/11/15 16:28:29 INFO master.Master: Removing executor app-20191115162829-0000/0 because it is FAILED
19/11/15 16:28:29 INFO master.Master: Launching executor app-20191115162829-0000/12 on worker worker-20191115162818-192.168.1.33-37953
19/11/15 16:28:29 INFO master.Master: Removing executor app-20191115162829-0000/9 because it is FAILED
19/11/15 16:28:29 ERROR master.Master: Application Spark shell with ID app-20191115162829-0000 failed 10 times; removing it
19/11/15 16:28:30 INFO master.Master: Removing app app-20191115162829-0000
查看worker日志:发现spark/work/下无法创建文件
19/11/15 16:28:29 INFO Worker: Asked to launch executor app-20191115162829-0000/3 for Spark shell
19/11/15 16:28:29 ERROR Worker: Failed to launch executor app-20191115162829-0000/3 for Spark shell.
java.io.IOException: Failed to create directory /soft/spark/work/app-20191115162829-0000/3
at org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:450)
at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
19/11/15 16:28:29 INFO Worker: Asked to launch executor app-20191115162829-0000/4 for Spark shell
19/11/15 16:28:29 ERROR Worker: Failed to launch executor app-20191115162829-0000/4 for Spark shell.
java.io.IOException: Failed to create directory /soft/spark/work/app-20191115162829-0000/4
at org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:450)
at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
查看work文件夹权限
drwxr-xr-x 2 root root 6 Nov 15 11:17 work
属于root,我用的是非root用户
两种解决方法,一个是把这个文件改用户,改成要使用的用户,这个我没有试验。
另一个是修改spark-env.sh文件并分发,我在这个文件中指定了work目录,如下
...省略一堆...
# Generic options for the daemons used in the standalone deploy mode
# - SPARK_CONF_DIR Alternate conf dir. (Default: ${SPARK_HOME}/conf)
# - SPARK_LOG_DIR Where log files are stored. (Default: ${SPARK_HOME}/logs)
# - SPARK_PID_DIR Where the pid file is stored. (Default: /tmp)
# - SPARK_IDENT_STRING A string representing this instance of spark. (Default: $USER)
# - SPARK_NICENESS The scheduling priority for daemons. (Default: 0)
# - SPARK_NO_DAEMONIZE Run the proposed command in the foreground. It will not output a PID file.
export JAVA_HOME=/soft/jdk
#主节点的IP
export SPARK_MASTER_IP=b1
#主节点的端口号,用来与worker通信
export SPARK_MASTER_PORT=7077
#每一个worker进程所能管理的核数
export SPARK_WORKER_CORES=2
#每一个worker进程所能管理的内存数
export SPARK_WORKER_MEMORY=1G
#worker的工作目录区
export SPARK_WORKER_DIR=/home/superahua/spark/work
随后重启集群进入spark shell成功