提交程序
curl -X POST http://spark-cluster-ip:6066/v1/submissions/create --header "Content-Type:application/json;charset=UTF-8" --data '{
"action" : "CreateSubmissionRequest",
"appArgs" : [ "args1, args2,..." ],
"appResource" : "file:/myfilepath/spark-job-1.0.jar",
"clientSparkVersion" : "2.1.0",
"environmentVariables" : {
"SPARK_ENV_LOADED" : "1"
},
"mainClass" : "com.mycompany.MyJob",
"sparkProperties" : {
"spark.jars" : "file:/myfilepath/spark-job-1.0.jar",
"spark.driver.supervise" : "false",
"spark.app.name" : "MyJob",
"spark.eventLog.enabled": "true",
"spark.submit.deployMode" : "cluster",
"spark.master" : "spark://spark-cluster-ip:6066"
}
}'
说明:
spark-cluster-ip:spark master地址。默认的rest服务端口是6066,如果被占用会依次查找6067,6068…
“action” : “CreateSubmissionRequest”:请求的内容是提交程序
“appArgs” : [ “args1, args2,…” ]:我们的程序jar包所需要的参数,如kafka topic,使用的模型等
- 注意1:如果程序没有需要的参数,这里写”appArgs”:[],不能不写,否则会把appResource后面的一条解析为appArgs引发未知错误
“appResource” : “file:/myfilepath/spark-job-1.0.jar”:程序jar包的路径
“environmentVariables” : {“SPARK_ENV_LOADED” : “1”}:是否加载Spark环境变量
- 注意2:此项必须要写,否则会报NullPointException
“clientSparkVersion” : “2.1.0”:spark的版本
“mainClass” : “com.mycompany.MyJob”:程序的主类
“sparkProperties” : {…}:spark的参数配置
响应结果:
{
"action" : "CreateSubmissionResponse",
"message" : "Driver successfully submitted as driver-20151008145126-0000",
"serverSparkVersion" : "2.1.0",
"submissionId" : "driver-20151008145126-0000",
"success" : true
}
每项内容都很好理解,其中最重要的是submissionId和success,前者用于查看和kill掉正在执行的spark程序,后者则可用来判断当前程序是否提交成功。
获取已提交程序的执行状态
curl http://spark-cluster-ip:6066/v1/submissions/status/driver-20151008145126-0000
“driver-20151008145126-0000”则为上面提交程序后返回的submissionId
响应结果:
{
"action" : "SubmissionStatusResponse",
"driverState" : "FINISHED",
"serverSparkVersion" : "2.1.0",
"submissionId" : "driver-20151008145126-0000",
"success" : true,
"workerHostPort" : "192.168.3.153:46894",
"workerId" : "worker-20151007093409-192.168.3.153-46894"
}
这里driverState包括:ERROR(因错误没有提交成功,会显示错误信息), SUBMITTED(已提交但未开始执行), RUNNIG(正在运行), FAILED(执行失败,会抛出异常), FINISHED(执行成功)
结束已提交的程序
curl -X POST http://spark-cluster-ip:6066/v1/submissions/kill/driver-20151008145126-0000
响应结果:
{
"action" : "KillSubmissionResponse",
"message" : "Kill request for driver-20151008145126-0000 submitted",
"serverSparkVersion" : "2.1.0",
"submissionId" : "driver-20151008145126-0000",
"success" : true
}
可能遇到的问题
java.nio.charset.MalformedInputException: Input length = 1
原因:发送请求的客户端编码与spark master端机器编码不一致