Two deploy mode: client and cluster.
† A common deployment strategy is to submit your application from a gateway machine that is physically co-located with your worker machines (e.g. Master node in a standalone EC2 cluster). In this setup, client
mode is appropriate. In client
mode, the driver is launched directly within thespark-submit
process which acts as a client to the cluster. The input and output of the application is attached to the console. Thus, this mode is especially suitable for applications that involve the REPL (e.g. Spark shell).
Alternatively, if your application is submitted from a machine far from the worker machines (e.g. locally on your laptop), it is common to usecluster
mode to minimize network latency between the drivers and the executors. Note that cluster
mode is currently not supported for Mesos clusters. Currently only YARN supports cluster mode for Python applications
When I look up the help of spark-submit by command spark-submit --help, I get :
--deploy-mode: Whether to launch the driver program locally("client") or on one of the worker machines inside the cluster("cluster") (default:client)
显然,如果我们在集群中的master上提交程序,master上会跑driver program,采用的是client mode.
但是如果是在集群的其他的节点呢,提交程序呢?
这里面涉及driver program 会在哪里运行的问题。
So what's driver program?
Driver program it the process running the main() function of the application and creating the sprakContext.
在哪个节点提交,driver program就在哪个节点运行。
references:
[1]http://spark.apache.org/docs/latest/submitting-applications.html(Accessed:2016-06-02)
[2]http://spark.apache.org/docs/latest/cluster-overview.html(Accessed:2016-06-02)