python3提交spark集群,来自Python的kubernetes集群上的Spark提交(2.3)

So now that k8s is integrated directly with spark in 2.3 my spark submit from the console executes correctly on a kuberenetes master without any spark master pods running, spark handles all the k8s details:

spark-submit \

--deploy-mode cluster \

--class com.app.myApp \

--master k8s://https://myCluster.com \

--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \

--conf spark.app.name=myApp \

--conf spark.executor.instances=10 \

--conf spark.kubernetes.container.image=myImage \

local:///myJar.jar

What I am trying to do is do a spark-submit via AWS lambda to my k8s cluster. Previously I used the command via the spark master REST API directly (without kubernetes):

request = requests.Request(

'POST',

"http://:6066/v1/submissions/create",

data=json.dumps(parameters))

prepared = request.prepare()

session = requests.Session()

response = session.send(prepared)

And it worked. Now I want to integrate Kubernetes and do it similarly where I submit an API request to my kubernetes cluster from python and have spark handle all the k8s details, ideally something like:

request = requests.Request(

'POST',

"k8s://https://myK8scluster.com:443",

data=json.dumps(parameters))

Is it possible in the Spark 2.3/Kubernetes integration?

解决方案

I afraid that is impossible for Spark 2.3, if you using native Kubernetes support.

Based on description from deployment instruction, submission process container several steps:

Spark creates a Spark driver running within a Kubernetes pod.

The driver creates executors which are also running within Kubernetes pods and connects to them, and executes application code.

When the application completes, the executor pods terminate and are cleaned up, but the driver pod persists logs and remains in “completed” state in the Kubernetes API until it’s eventually garbage collected or manually cleaned up.

So, in fact, you have no place to submit a job until you starting a submission process, which will launch a first Spark's pod (driver) for you. And after application completes, everything terminated.

Because of running a fat container on AWS Lambda is not a best solution, and also because if is not way to run any commands in container itself (is is possible, but with hack, here is blueprint about executing Bash inside an AWS Lambda) the simplest way is to write some small custom service, which will work on machine outside of AWS Lambda and provide REST interface between your application and spark-submit utility. I don't see any other ways to make it without a pain.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值