使用REST API提交Apache Spark Job
使用Apache spark时,有时需要从群集外部按需触发Spark作业。我们可以通过两种方式在群集中提交Apache spark作业。
- Spark从Spark集群中提交
要从spark集群中提交spark作业,我们使用spark-submit。下面是一个示例shell脚本,它提交了Spark作业。大多数参与者都是自我解释的。
<span style="color:#212529"><span style="color:#212529"><code><span style="color:#93a1a1">#!/bin/bash</span>
<span style="color:#22b3eb">$SPARK_HOME</span>/bin/spark-submit <span style="color:#cb4b16">\</span>
<span style="color:#22b3eb">--class</span> com.nitendragautam.sparkbatchapp.main.Boot <span style="color:#cb4b16">\</span>
<span style="color:#22b3eb">--master</span> spark://192.168.133.128:7077 <span style="color:#cb4b16">\</span>
<span style="color:#22b3eb">--deploy-mode</span> cluster <span style="color:#cb4b16">\</span>
<span style="color:#22b3eb">--supervise</span> <span style="color:#cb4b16">\</span>
<span style="color:#22b3eb">--executor-memory</span> 4G <span style="color:#cb4b16">\</span>
<span style="color:#22b3eb">--driver-memory</span> 4G <span style="color:#cb4b16">\</span>
<span style="color:#22b3eb">--total-executor-cores</span> 2 <span style="color:#cb4b16">\</span>
/home/hduser/sparkbatchapp.jar <span style="color:#cb4b16">\</span>
/home/hduser/NDSBatchApp/input <span style="color:#cb4b16">\</span>
/home/hduser/NDSBatchApp/output/
</code></span></span>
- 来自Spark集群外部的REST API
在这篇文章中,我将解释如何在REST API的帮助下触发Spark作业。我请在提交Spark Job之前确保Spark Cluster正在运行。
图:Apache Spark Master
使用Shell脚本触发Spark批处理作业
创建一个submit_spark_job.sh
以下面的内容命名的Shell脚本。给shell脚本
<span style="color:#212529"><span style="color:#212529"><code><span style="color:#93a1a1">#!/bin/bash</span>
curl <span style="color:#22b3eb">-X</span> POST http://192.168.133.128:6066/v1/submissions/create <span style="color:#22b3eb">--header</span> <span style="color:#2aa198">"Content-Type:application/json;charset=UTF-8"</span> <span style="color:#22b3eb">--data</span> <span style="color:#2aa198">'{
"appResource": "/home/hduser/sparkbatchapp.jar",
"sparkProperties": {
"spark.executor.memory": "4g",
"spark.master": "spark://192.168.133.128:7077",
"spark.driver.memory": "4g",
"spark.driver.cores": "2",
"spark.eventLog.enabled": "false",
"spark.app.name": "Spark REST API201804291717022",
"spark.submit.deployMode": "cluster",
"spark.jars": "/home/hduser/sparkbatchapp.jar",
"spark.driver.supervise": "true"
},
"clientSparkVersion": "2.0.1",
"mainClass": "com.nitendragautam.sparkbatchapp.main.Boot",
"environmentVariables": {
"SPARK_ENV_LOADED": "1"
},
"action": "CreateSubmissionRequest",
"appArgs": [
"/home/hduser/NDSBatchApp/input",
"/home/hduser/NDSBatchApp/output/"
]
}'</span>
</code></span></span>
一旦火花作业成功执行,您将看到具有以下内容的输出。
<span style="color:#212529"><span style="color:#212529"><code>
nitendragautam@Nemo: sh submit_spark_job.sh
<span style="color:#859900">{</span>
<span style="color:#2aa198">"action"</span> : <span style="color:#2aa198">"CreateSubmissionResponse"</span>,
<span style="color:#2aa198">"message"</span> : <span style="color:#2aa198">"Driver successfully submitted as driver-20180429125849-0001"</span>,
<span style="color:#2aa198">"serverSparkVersion"</span> : <span style="color:#2aa198">"2.0.1"</span>,
<span style="color:#2aa198">"submissionId"</span> : <span style="color:#2aa198">"driver-20180429125849-0001"</span>,
<span style="color:#2aa198">"success"</span> : <span style="color:#b58900">true</span>
<span style="color:#859900">}</span>
</code></span></span>
使用REST API检查Spark作业的状态
如果要检查Spark作业的状态,可以使用Submission Id和下面的shell脚本。
<span style="color:#212529"><span style="color:#212529"><code> curl http://192.168.133.128:6066/v1/submissions/status/driver-20180429125849-0001
<span style="color:#859900">{</span>
<span style="color:#2aa198">"action"</span> : <span style="color:#2aa198">"SubmissionStatusResponse"</span>,
<span style="color:#2aa198">"driverState"</span> : <span style="color:#2aa198">"FINISHED"</span>,
<span style="color:#2aa198">"serverSparkVersion"</span> : <span style="color:#2aa198">"2.0.1"</span>,
<span style="color:#2aa198">"submissionId"</span> : <span style="color:#2aa198">"driver-20180429125849-0001"</span>,
<span style="color:#2aa198">"success"</span> : <span style="color:#b58900">true</span>,
<span style="color:#2aa198">"workerHostPort"</span> : <span style="color:#2aa198">"192.168.133.128:38451"</span>,
<span style="color:#2aa198">"workerId"</span> : <span style="color:#2aa198">"worker-20180429124356-192.168.133.128-38451"</span>
<span style="color:#859900">}</span>
</code></span></span>