[spark-src-core] 3.3 run spark in standalone(cluster) mode

最新推荐文章于 2021-10-03 20:33:42 发布

leibnitz09

最新推荐文章于 2021-10-03 20:33:42 发布

阅读量263

点赞数

分类专栏： spark 文章标签：大数据 ui java

spark 专栏收录该内容

28 篇文章 0 订阅

订阅专栏

simiar to the prevous article,this one is focused on cluster mode.

1.issue command

./bin/spark-submit  --class org.apache.spark.examples.JavaWordCount --deploy-mode cluster --master spark://gzsw-02:6066 lib/spark-examples-1.4.1-hadoop2.4.0.jar hdfs://host02:/user/hadoop/input.txt

note:1) the deploy-mode is necessary to specify by 'cluster'.

2) then the 'master' param is rest-url,ie,

REST URL: spark://gzsw-02:6066 (cluster mode)

which shown in spark master ui page,since spark will use rest.RestSubmissionClient to submit jobs.

2.run logs in user side(it's brief,as this is cluster mode)

Spark Command: /usr/local/jdk/jdk1.6.0_31/bin/java -cp /home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/conf/:/home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/lib/spark-assembly-1.4.1-hadoop2.4.0.jar:/home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar:/home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar:/home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/usr/local/hadoop/hadoop-2.5.2/etc/hadoop/ -XX:MaxPermSize=256m org.apache.spark.deploy.SparkSubmit --master spark://gzsw-02:6066 --deploy-mode cluster --class org.apache.spark.examples.JavaWordCount lib/spark-examples-1.4.1-hadoop2.4.0.jar hdfs://hd02:/user/hadoop/input.txt
========================================
-executed cmd retruned by Main.java:/usr/local/jdk/jdk1.6.0_31/bin/java -cp /home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/conf/:/home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/lib/spark-assembly-1.4.1-hadoop2.4.0.jar:/home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar:/home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar:/home/hadoop/spark/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/usr/local/hadoop/hadoop-2.5.2/etc/hadoop/ -XX:MaxPermSize=256m org.apache.spark.deploy.SparkSubmit --master spark://gzsw-02:6066 --deploy-mode cluster --class org.apache.spark.examples.JavaWordCount lib/spark-examples-1.4.1-hadoop2.4.0.jar hdfs://host02:/user/hadoop/input.txt
Running Spark using the REST application submission protocol.
16/09/19 11:26:06 INFO rest.RestSubmissionClient: Submitting a request to launch an application in spark://gzsw-02:6066.
16/09/19 11:26:07 INFO rest.RestSubmissionClient: Submission successfully created as driver-20160919112607-0001. Polling submission state...
16/09/19 11:26:07 INFO rest.RestSubmissionClient: Submitting a request for the status of submission driver-20160919112607-0001 in spark://gzsw-02:6066.
16/09/19 11:26:07 INFO rest.RestSubmissionClient: State of driver driver-20160919112607-0001 is now RUNNING.
16/09/19 11:26:07 INFO rest.RestSubmissionClient: Driver is running on worker worker-20160914175456-192.168.100.14-36693 at 192.168.100.14:36693.
16/09/19 11:26:07 INFO rest.RestSubmissionClient: Server responded with CreateSubmissionResponse:
{
  "action" : "CreateSubmissionResponse",
  "message" : "Driver successfully submitted as driver-20160919112607-0001",
  "serverSparkVersion" : "1.4.1",
  "submissionId" : "driver-20160919112607-0001",
  "success" : true
}
16/09/19 11:26:07 INFO util.Utils: Shutdown hook called

so we know,driver is running on worker 192.168.100.14:36693(not local host)

3.FAQ

1) in cluser mode,the driver info will show in spark master ui page(but not for client mode)

(app-0000/0001 both are run in cluster mode,so the corresponding drivers are shown in 'completed drivers' block)

2) can't open the application detail ui.ie when you click the app which run in cluster mode,similar errors will compain about:

Application history not found (app-20160919151936-0000)
No event logs found for application JavaWordCount in file:/home/hadoop/spark/spark-eventlog/. Did you specify the correct logging directory?

this msg is present as in cluster mode,the driver will run on other worker instead of master local host,so a request to master will find nothing about this app.

workaround:use the hdfs fs instead of local fs,ie

spark.eventLog.dir=hdfs://host02:8020/user/hadoop/spark-eventlog

3) applications disappear after restart spark

eventhrough you set a distributed filesystem to 'spark.eventlog.dir' mentioned above,you will see nothgin when restart spark,that means spark master will keep apps info in mem when it's alive,but when restarts.there is a spark-history-server.sh to figure out this problem[1]

ref:

[1]Spark History Server配置使用

[spark-src-core] 3.2.run spark in standalone(client) mode

leibnitz09

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
[spark-src-core] 3.3 run spark in standalone(cluster) mode

simiar to the prevous article,this one is focused on cluster mode.1.issue command./bin/spark-submit --class org.apache.spark.examples.JavaWordCount --deploy-mode cluster --master spark://gzsw...
复制链接

扫一扫