大数据学习(五):如何使用 Livy提交spark批量任务--转载

Livy是一个开源的REST 接口,用于与Spark进行交互,它同时支持提交执行代码段和完整的程序。

 

Livy封装了spark-submit并支持远端执行。

启动服务器

执行以下命令,启动livy服务器。

./bin/livy-server

这里假设spark使用yarn模式,所以所有文件路径都默认位于HDFS中。如果是本地开发模式的话,直接使用本地文件即可(注意必须配置livy.conf文件,设置livy.file.local-dir-whitelist = directory,以允许文件添加到session)。

提交jar包

首先我们列出当前正在执行的任务:

 
  1. curl localhost:8998/sessions | python -m json.tool % Total % Received % Xferd Average Speed Time Time Time Current

  2. Dload Upload Total Spent Left Speed

  3. 100 34 0 34 0 0 2314 0 --:--:-- --:--:-- --:--:-- 2428

  4. {

  5. "from": 0,

  6. "sessions": [],

  7. "total": 0

  8. }

然后提交jar包,假设提交的jar包位于hdfs中,路径为/usr/lib/spark/lib/spark-examples.jar

 
  1. curl -X POST --data '{"file": "/user/romain/spark-examples.jar", "className": "org.apache.spark.examples.SparkPi"}' -H "Content-Type: application/json" localhost:8998/batches

  2. {"id":0,"state":"running","log":[]}

返回结果中包括了提交的ID,这里为0,我们可以通过下面的命令查看任务状态:

 
  1. curl localhost:8998/batches/0 | python -m json.tool

  2. % Total % Received % Xferd Average Speed Time Time Time Current

  3. Dload Upload Total Spent Left Speed

  4. 100 902 0 902 0 0 91120 0 --:--:-- --:--:-- --:--:-- 97k

  5. {

  6. "id": 0,

  7. "log": [

  8. "15/10/20 16:32:21 INFO ui.SparkUI: Stopped Spark web UI at http://192.168.1.30:4040",

  9. "15/10/20 16:32:21 INFO scheduler.DAGScheduler: Stopping DAGScheduler",

  10. "15/10/20 16:32:21 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!",

  11. "15/10/20 16:32:21 INFO storage.MemoryStore: MemoryStore cleared",

  12. "15/10/20 16:32:21 INFO storage.BlockManager: BlockManager stopped",

  13. "15/10/20 16:32:21 INFO storage.BlockManagerMaster: BlockManagerMaster stopped",

  14. "15/10/20 16:32:21 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!",

  15. "15/10/20 16:32:21 INFO spark.SparkContext: Successfully stopped SparkContext",

  16. "15/10/20 16:32:21 INFO util.ShutdownHookManager: Shutdown hook called",

  17. "15/10/20 16:32:21 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-6e362908-465a-4c67-baa1-3dcf2d91449c"

  18. ],

  19. "state": "success"

  20. }

此外,还可以通过下面的api,获取日志信息:

 
  1. curl localhost:8998/batches/0/log | python -m json.tool

  2. % Total % Received % Xferd Average Speed Time Time Time Current

  3. Dload Upload Total Spent Left Speed

  4. 100 5378 0 5378 0 0 570k 0 --:--:-- --:--:-- --:--:-- 583k

  5. {

  6. "from": 0,

  7. "id": 3,

  8. "log": [

  9. "SLF4J: Class path contains multiple SLF4J bindings.",

  10. "SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]",

  11. "SLF4J: Found binding in [jar:file:/usr/lib/flume-ng/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]",

  12. "SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.",

  13. "SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]",

  14. "15/10/21 01:37:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable",

  15. "15/10/21 01:37:27 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032",

  16. "15/10/21 01:37:27 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers",

  17. "15/10/21 01:37:27 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)",

  18. "15/10/21 01:37:27 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead",

  19. "15/10/21 01:37:27 INFO yarn.Client: Setting up container launch context for our AM",

  20. "15/10/21 01:37:27 INFO yarn.Client: Setting up the launch environment for our AM container",

  21. "15/10/21 01:37:27 INFO yarn.Client: Preparing resources for our AM container",

  22. ....

  23. ....

  24. "15/10/21 01:37:40 INFO yarn.Client: Application report for application_1444917524249_0004 (state: RUNNING)",

  25. "15/10/21 01:37:41 INFO yarn.Client: Application report for application_1444917524249_0004 (state: RUNNING)",

  26. "15/10/21 01:37:42 INFO yarn.Client: Application report for application_1444917524249_0004 (state: FINISHED)",

  27. "15/10/21 01:37:42 INFO yarn.Client: ",

  28. "\t client token: N/A",

  29. "\t diagnostics: N/A",

  30. "\t ApplicationMaster host: 192.168.1.30",

  31. "\t ApplicationMaster RPC port: 0",

  32. "\t queue: root.romain",

  33. "\t start time: 1445416649481",

  34. "\t final status: SUCCEEDED",

  35. "\t tracking URL: http://unreal:8088/proxy/application_1444917524249_0004/A",

  36. "\t user: romain",

  37. "15/10/21 01:37:42 INFO util.ShutdownHookManager: Shutdown hook called",

  38. "15/10/21 01:37:42 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-26cdc4d9-071e-4420-a2f9-308a61af592c"

  39. ],

  40. "total": 67

  41. }

还可以在命令行中添加参数,例如这里计算一百次:

 
  1. curl -X POST --data '{"file": "/usr/lib/spark/lib/spark-examples.jar", "className": "org.apache.spark.examples.SparkPi", "args": ["100"]}' -H "Content-Type: application/json" localhost:8998/batches

  2. {"id":1,"state":"running","log":[]}

如果想终止任务,可以调用以下API:

 
  1. curl -X DELETE localhost:8998/batches/1

  2. {"msg":"deleted"}

当重复调用上述接口时,什么也不会做,因为任务已经删除了:

 
  1. curl -X DELETE localhost:8998/batches/1

  2. session not found

提交Python任务

提交Python任务和Jar包类似:

 
  1. curl -X POST --data '{"file": "/user/romain/pi.py"}' -H "Content-Type: application/json" localhost:8998/batches

  2. {"id":2,"state":"starting","log":[]}

检查任务状态:

 
  1. curl localhost:8998/batches/2 | python -m json.tool

  2. % Total % Received % Xferd Average Speed Time Time Time Current

  3. Dload Upload Total Spent Left Speed

  4. 100 616 0 616 0 0 77552 0 --:--:-- --:--:-- --:--:-- 88000

  5. {

  6. "id": 2,

  7. "log": [

  8. "\t ApplicationMaster host: 192.168.1.30",

  9. "\t ApplicationMaster RPC port: 0",

  10. "\t queue: root.romain",

  11. "\t start time: 1445417899564",

  12. "\t final status: UNDEFINED",

  13. "\t tracking URL: http://unreal:8088/proxy/application_1444917524249_0006/",

  14. "\t user: romain",

  15. "15/10/21 01:58:26 INFO yarn.Client: Application report for application_1444917524249_0006 (state: RUNNING)",

  16. "15/10/21 01:58:27 INFO yarn.Client: Application report for application_1444917524249_0006 (state: RUNNING)",

  17. "15/10/21 01:58:28 INFO yarn.Client: Application report for application_1444917524249_0006 (state: RUNNING)"

  18. ],

  19. "state": "running"

  20. }

获取日志信息:

curl localhost:8998/batches/2/log |  python -m json.tool

 

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值