基于mesos集群中spark是如何提交任务的

最近公司部署mesos,在测试的时候遇见一些问题,顺便研究了下spark任务的提交过程。将研究的结果和大家分享一下。
 
目前我们的任务提交,主要有command模式和Java调用API提交(魔盒再使用)两种模式。根据目前研究的结果,无论采用哪一种模式,最终都是采用api提交。
 
首先看下command是怎么玩起来的
 
我们通常情况下调用 spark-submit提交任务,
spark-submit脚本如下:
exec "${SPARK_HOME}"/bin/spark-class org.apache.spark.deploy.SparkSubmit "$@"
暂时不管exec命令是干什么的,spark-submit 会调用 spark-class 脚本,并且传入参数。
 
再看下spark-class脚本
build_command() {
  "$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" org.apache.spark.launcher.Main "$@"
  printf "%d\0" $?
}

CMD=()
while IFS= read -d '' -r ARG; do
  CMD+=("$ARG")
done < <(build_command "$@")

COUNT=${#CMD[@]}
LAST=$((COUNT - 1))
LAUNCHER_EXIT_CODE=${CMD[$LAST]}
# if [ $LAUNCHER_EXIT_CODE != 0 ]; then
#  exit $LAUNCHER_EXIT_CODE
# fi

exec CMD=("${CMD[@]:0:$LAST}")
 
通过 build_command 函数,可以看出,最终spark脚本调用的第一个程序是org.apache.spark.launcher.Main 类,并且传入相应的参数
下边的while循环又把spark-submit脚本中的 org.apache.spark.deploy.SparkSubmit 参数加入到ARG中。最终作为参数,传入到Main类
 
接下来打开 Main类,
Main类会调用 org.apache.spark.deploy.SparkSubmit的main方法(至于怎么调用的,我也没有看明白,我是从网上资料以及自己对代码的测试跟踪发现的)。并将相应的参数传递进来
 
override defmain(args: Array[ String]): Unit = {
valappArgs = newSparkSubmitArguments(args)
appArgs. action match{
case SparkSubmitAction. SUBMIT => submit (appArgs)
caseSparkSubmitAction. KILL=> kill(appArgs)
caseSparkSubmitAction. REQUEST_STATUS=> requestStatus(appArgs)
}
}
 
这个main方法调用submit方法。并传入appArgs参数。
 
接下来在研究下submit方法
 
private defsubmit(args: SparkSubmitArguments): Unit = {
val(childArgs, childClasspath, sysProps, childMainClass) =prepareSubmitEnvironment(args)

defdoRunMain(): Unit = {
if(args.proxyUser!=null) {
valproxyUser = UserGroupInformation.createProxyUser(args.proxyUser,
UserGroupInformation.getCurrentUser())
try{
proxyUser.doAs(newPrivilegedExceptionAction[Unit]() {
override defrun(): Unit = {
runMain(childArgs, childClasspath, sysProps, childMainClass, args.verbose)
}
})
}catch{}

}else{
runMain(childArgs, childClasspath, sysProps, childMainClass, args.verbose)
}
}

 

 
 
从代码中可以看出,submit方法主要做两件事请,首先对传进的参数和运营环境进行进行封装。其次是对封装后的参数提交给runmai进行运行。
 
prepareSubmitEnvironment 方法在封装参数的过程,会生成 childMainClass ,该参数是根据环境和传入的参数生成,如果使用mesos的cluster,则该参数值是被在代码中写死的
代码如下:
 
if(isMesosCluster) {
childMainClass = "org.apache.spark.deploy.rest.RestSubmissionClient"
}
 
接下来看下 runMain方法。
private defrunMain(
childArgs:Seq[String],
childClasspath:Seq[String],
sysProps: Map[String,String],
childMainClass:String,
verbose: Boolean): Unit = {

try{
mainClass = Utils.classForName(childMainClass)
}catch{ }

valmainMethod = mainClass.getMethod("main",newArray[String](0).getClass)

try{
mainMethod.invoke(null, childArgs.toArray)
}catch{
caset:Throwable=>
findCause(t)match{
caseSparkUserAppException(exitCode) =>
System.exit(exitCode)

caset:Throwable=>
throwt
}
}
}

 

通过代码可以发现,runMain方法调用了 RestSubmissionClient.main 方法,main方法调用 run方法。
到了run方法,也就进入最关键的阶段
 
def run(
appResource:String,
mainClass:String,
appArgs: Array[String],
conf: SparkConf,
env:Map[String,String] =Map()): SubmitRestProtocolResponse = {
valmaster = conf.getOption("spark.master").getOrElse {
throw newIllegalArgumentException("'spark.master' must be set.")
}
valsparkProperties = conf.getAll.toMap
valclient =newRestSubmissionClient(master)
valsubmitRequest = client.constructSubmitRequest(
appResource, mainClass, appArgs, sparkProperties, env)
varcreateSubmissionResponse = client.createSubmission(submitRequest)
createSubmissionResponse
}

 

 
run方法首先创建了个REST客户端
接着样需要的请求信息进行封装,然后调用 createSubmission方法,然后再去看看 createSubmission方法中的代码
 
def createSubmission(request: CreateSubmissionRequest): SubmitRestProtocolResponse = {
varhandled: Boolean =false
varresponse: SubmitRestProtocolResponse =null
for(m <-mastersif!handled) {
validateMaster(m)
valurl = getSubmitUrl(m)

try{
response = postJson(url, request.toJson)
responsematch{
cases: CreateSubmissionResponse =>
if(s.success) {
reportSubmissionStatus(s)
handleRestResponse(s)
handled =true
}
caseunexpected =>
handleUnexpectedRestResponse(unexpected)
}
}catch{
casee: SubmitRestConnectionException =>
if(handleConnectionException(m)) {
throw newSubmitRestConnectionException("Unable to connect to server", e)
}
}
}
response
}

 

 
执行完这个方法,我们的是park任务就提交完毕,通过这个方法可以非常明显的看出,原来提交过程,其实就是一个REST请求。并且将请求的返回信息封装到
CreateSubmissionResponse 对象当中。
 
private[spark]classCreateSubmissionResponseextendsSubmitRestProtocolResponse {
varsubmissionId:String=null
protected override defdoValidate(): Unit = {
super.doValidate()
assertFieldIsSet(success,"success")
}
}

 

 
 
通过 CreateSubmissionResponse 类可以发现 submissionId 参数
通过测试发现,其实就是我们常用的任务ID
 
到此为止,我们并不知道 这个任务的执行状态,通过查看RestSubmissionClient 类的代码,发现还有一个
requestSubmissionStatus方法,代码如下
def requestSubmissionStatus(
submissionId:String,
quiet: Boolean =false): SubmitRestProtocolResponse = {
logInfo(s"Submitting a request for the status of submission$submissionId in$master.")

varhandled: Boolean =false
varresponse: SubmitRestProtocolResponse =null
for(m <-mastersif!handled) {
validateMaster(m)
valurl = getStatusUrl(m, submissionId)
try{
response = get(url)
responsematch{
cases: SubmissionStatusResponseifs.success=>
if(!quiet) {
handleRestResponse(s)
}
handled =true
caseunexpected =>
handleUnexpectedRestResponse(unexpected)
}
}catch{ }
}
response
}

 

 
查看说明和代码发现,可以依据 submissionId 返回SubmissionStatusResponse 对象。
再来看看类
 
private[spark] class SubmissionStatusResponse extends SubmitRestProtocolResponse {
varsubmissionId:String=null
vardriverState:String=null
varworkerId:String=null
varworkerHostPort:String=null

protected override defdoValidate(): Unit = {
super.doValidate()
assertFieldIsSet(submissionId,"submissionId")
assertFieldIsSet(success,"success")
}
}

 

 
果然有个driverState状态。到此为止,可以简单总结下sparksubmit 的提交过程。
 
spark-submit 脚本执行的时候,调用 spark-class 脚本。
然后spark-class 脚本 执行的时候调用org.apache.spark.launcher .Main 类
org.apache.spark.launcher .Main 类调用 org.apache.spark.deploy .SparkSubmit类。
接着org.apache.spark.deploy .SparkSubmit类调用org.apache.spark.deploy.rest .RestSubmissionClient 类。
 
org.apache.spark.deploy.rest .RestSubmissionClient 类通过rest请求,通过API的方式创建任务。
注意,这里仅仅创建一个任务,创建任务完毕之后,通过cammand方式提交的任务就算执行完毕了,接下来能否正常执行完全看spark的造化了。
 
绕了这么多,其实就是发送一个REST请求,接下来通过命令行的方式,自己发送一个请求,查看下结果
 
curl -XPOST 'http://192.168.23.7:7077/v1/submissions/create' -d '{
"action" : "CreateSubmissionRequest",
"appArgs" : [ "20180315" ],
"appResource" : "hdfs://hadoopha/datacenter/jar/spark_test.jar",
"clientSparkVersion" : "2.2.0",
"environmentVariables" : {
"SPARK_SCALA_VERSION" : "2.10"
},
"mainClass" : "com.yunzongnet.datacenter.spark.main.SparkForTest",
"sparkProperties" : {
"spark.sql.ui.retainedExecutions" : "2000",
"spark.executorEnv.MESOS_NATIVE_JAVA_LIBRARY" : "/usr/local/lib/libmesos.so",
"spark.dynamicAllocation.sustainedSchedulerBacklogTimeout" : "5",
"spark.history.fs.logDirectory" : "hdfs://hadoopha/spark/eventlog",
"spark.eventLog.enabled" : "true",
"spark.streaming.ui.retainedBatches" : "2000",
"spark.shuffle.service.enabled" : "true",
"spark.jars" : "hdfs://hadoopha/datacenter/jar/spark_test.jar",
"spark.mesos.executor.docker.volumes" : "/spark_local_dir:/spark_local_dir:rw",
"spark.driver.supervise" : "false",
"spark.app.name" : "sparkjob5",
"spark.cores.max" : "6",
"spark.dynamicAllocation.schedulerBacklogTimeout" : "1",
"spark.mesos.principal" : "admin",
"spark.worker.ui.retainedDrivers" : "2000",
"spark.driver.memory" : "4G",
"spark.files.fetchTimeout" : "900s",
"spark.mesos.uris" : "/etc/docker.tar.gz",
"spark.mesos.secret" : "admin",
"spark.deploy.retainedDrivers" : "2000",
"spark.mesos.role" : "root",
"spark.files" : "file:///usr/local/hadoop-2.6.0/etc/hadoop/hdfs-site.xml,file:///usr/local/hadoop-2.6.0/etc/hadoop/core-site.xml",
"spark.mesos.executor.docker.image" : "registry.seagle.me:443/spark-2-base:v1",
"spark.submit.deployMode" : "cluster",
"spark.master" : "mesos://192.168.23.7:7077",
"spark.executor.memory" : "12G",
"spark.driver.extraClassPath" : "/usr/local/alluxio/core/client/target/alluxio-core-client-1.1.0-SNAPSHOT-jar-with-dependencies.jar,/usr/local/spark/jars/*",
"spark.local.dir" : "/spark_local_dir",
"spark.eventLog.dir" : "hdfs://hadoopha/spark/eventlog",
"spark.dynamicAllocation.enabled" : "true",
"spark.executor.cores" : "2",
"spark.deploy.retainedApplications" : "2000",
"spark.worker.ui.retainedExecutors" : "2000",
"spark.dynamicAllocation.executorIdleTimeout" : "60",
"spark.mesos.executor.home" : "/usr/local/spark"
}
}'
 
REST 请求发送之后,会立即返回
{
"action" : "CreateSubmissionResponse",
"serverSparkVersion" : "2.0.0",
"submissionId" : "driver-20170425164456-271697",
"success" : true
}
 
接下来在发送一个GET请求 curl -XGET 'http://192.168.23.7:7077/v1/submissions/status/driver-20170425164456-271697'
 
{
"action" : "SubmissionStatusResponse",
"driverState" : "RUNNING",
"message" : "task_id {\n value: \"driver-20170425164456-271697\"\n}\nstate: TASK_RUNNING\n"
}
我们能发现当前任务证处于 RUNNING状态。
 
过一段时间,持续发送该GET请求。最终返回
{
"action" : "SubmissionStatusResponse",
"driverState" : " FINISHED",
"message" : "task_id {\n value: \"driver-20170425164456-271697\"\n}\nstate: TASK_FAILED\nmessage: \"Container exited with status 1\"\nslave_id {\n value: \"0600da48-750d-48e2-ba79-b78936224c83-S2\"\n}\ntimestamp: 1.493109963886255E9\nexecutor_id {\n value: \"driver-20170425164456-271697\"\n}\nsource: SOURCE_EXECUTOR\n11: \"4\\222O\\215\\201\\334L^\\232\\303\\313:j&\\004\\'\"\n13: \"\\n\\017*\\r\\022\\v192.168.23.1\"\n",
"serverSparkVersion" : "2.0.0",
"submissionId" : "driver-20170425164456-271697",
"success" : true
}
 
我们发现该任务运行完毕,并且是运行失败。
 
  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值