上传作业jar包及python等文件
分析集群选择使用HttpFS服务来供用户上传管理作业的jar包、python文件等到服务端
1、从分析集群控制台获取HttpFS服务地址,比如:
HttpFS:http://ap-xxx-.9b78df04-b.rds.aliyuncs.com:14000
2、使用建议:
目前可以使用Restful API或者命令行来管理这些资源。具体使用参考文档,下面以Restful API来举例:
为了使用合理,建议HttpFS的用户名为:resource;resource上传的根目录为/resourcesdir/,该目录后面可以创建子目录
3、上传本地jar包或者python文件到Spark服务端
创建目录/resourcesdir:
curl-i-X PUT"http://ap-xxx.rds.aliyuncs.com:14000/webhdfs/v1/resourcesdir?op=MKDIRS&user.name=resource"
上传jar:
上传本地./examples/jars/examples_2.11-2.3.2.jar到HttpFs的/resourcesdir/examples_2.11-2.3.2.jar
curl-i-X PUT-T./examples/jars/examples_2.11-2.3.2.jar"http://ap-xxx-.9b78df04-b.rds.aliyuncs.com:14000/webhdfs/v1/resourcesdir/examples_2.11-2.3.2.jar?op=CREATE&data=true&user.name=resource"-H"Content-Type:application/octet-stream"
上传python:上传本地./examples/src/main/python/pi.py到HttpFs的/resourcesdir/pi.py
curl-i-X PUT-T./examples/src/main/python/pi.py"http://ap-xxx-.9b78df04-b.rds.aliyuncs.com:14000/webhdfs/v1/resourcesdir/pi.py?op=CREATE&data=true&user.name=resource"-H"Content-Type:application/octet-stream"
查看文件:查看HttpFs的/resourcesdir/目录文件
curl-i"http://ap-xxx-.9b78df04-b.rds.aliyuncs.com:14000/webhdfs/v1/resourcesdir/?op=LISTSTATUS&user.name=resource"
通过作业管理服务(LivyServer)提交作业
Spark服务选择Apache LivyServer来构建作业管理服务,支持提交jar(包括streaming)、python等
1、分析集群控制台获取LivyServer服务地址,比如:
LivyServer:http://ap-xxx-master1-001.spark.9b78df04-b.rds.aliyuncs.com:8998
2、提交作业
编写LivyServer上传作业的json文件livy_pi.json{
"file":"/resourcesdir/spark-examples_2.11-2.3.2.jar",
"className":"org.apache.spark.examples.SparkPi",
"driverMemory":"1g",
"executorMemory":"1g",
"conf":{
"spark.executor.instances":"1",
"spark.executor.cores":"1"
}
}
提交jar作业
命令:
curl-H"Content-Type: application/json"-X POST-d@livy_pi.json http://ap-xxx-master1-001.spark.9b78df04-b.rds.aliyuncs.com:8998/batches |python -m json.tool
样例:
[root@master]#curl-H"Content-Type: application/json"-X POST-d@livy_pi.json http://ap-xxx-master1-001.spark.9b78df04-b.rds.aliyuncs.com:8998/batches |python -m json.tool
%Total%Received%XferdAverageSpeedTimeTimeTimeCurrent
DloadUploadTotalSpentLeftSpeed
10036810014510022348157405--:--:----:--:----:--:--7689
{
"appId":null,
"appInfo":{
"driverLogUrl":null,
"sparkUiUrl":null
},
"id":1,
"log":[
"stdout: ",
"\nstderr: ",
"\nYARN Diagnostics: "
],
"state":"starting"
}提交python作业
命令:
curl-X POST--data'{"file": "/resourcesdir/pi.py"}'-H"Content-Type: application/json"http://ap-xxx-master1-001.spark.9b78df04-b.rds.aliyuncs.com:8998/batches
3、查询作业状态
通过LivyServer的API以及Spark UI查看
命令:
curl http://ap-xxx-master1-001.spark.9b78df04-b.rds.aliyuncs.com:8998/batches/1/state | python -m json.tool
样例:
[root@master t-apsara-spark-2.2.2]#curl http://ap-xxx-master1-001.spark.9b78df04-b.rds.aliyuncs.com:8998/batches/1/state | python -m json.tool
%Total%Received%XferdAverageSpeedTimeTimeTimeCurrent
DloadUploadTotalSpentLeftSpeed
10026100260019040--:--:----:--:----:--:--2000
{
"id":1,
"state":"success"
}
4、 参考资料