livy使用样例_通过在Livy上提交批POST方法并跟踪作业来使用Airflow触发作业提交...

I want to use Airflow for orchestration of jobs that includes running some pig scripts, shell scripts and spark jobs.

Mainly on Spark jobs, I want to use Apache Livy but not sure whether it is good idea to use or run spark-submit.

What is best way to track Spark job using Airflow if even I submitted?

解决方案

My assumption is you an application JAR containing Java / Scala code that you want to submit to remote Spark cluster. Livy is arguably the best option for remote spark-submit when evaluated against other possibilities:

Specifying remote master IP: Requires modifying global configurations / environment variables

Using SSHOperator: SSH connection might break

Using EmrAddStepsOperator: Dependent on EMR

Regarding tracking

Livy only reports state and not progress (% completion of stages)

If your'e OK with that, you can just poll the Livy server via REST API and keep printing logs in console, those will appear on task logs in WebUI (View Logs)

Other considerations

Livy doesn't support reusing SparkSession for POST/batches request

If that's imperative, you'll have to write your application code in PySpark and use POST/session requests

References

Useful links

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值