1. 为什么要用Livy
- Have long running SparkContexts that can be used for multiple Spark jobs, by multiple clients
- Share cached RDDs or Dataframes across multiple jobs and clients
- Multiple SparkContexts can be managed simultaneously, and they run on the cluster (YARN/Mesos) instead of the Livy Server for good fault tolerance and concurrency
- Jobs can be submitted as precompiled jars, snippets of code, or via Java/Scala client API
- Ensure security via secure authenticated communication
- Apache License, 100% open source
2.Livy的运行模式(local和Yarn模式)
Then we upload the Spark example jar on HDFS and point to it. If you are using Livy in local mode and not YARN mode, just keep the local path .
(如果是Cluster模式的话,Livy会读取HDFS上的文件此时应该把依赖jar上传到HDFS上)
It is strongly recommended to configure Spark to submit applications in YARN cluster mode. That makes sure that user sessions have their resources properly accounted for in the YARN cluster, and that the host running the Livy server doesn't become overloaded when multiple user sessions are running.(当有多个session的时候为了减少Livy server的压力,建议部署成yarn的模式)
3.restful 接口
1.提交一个sparkjob
curl -X POST --data '{"file": "/opt/jars/testLivy.jar", "className": "com.testLivy.TestLivyJob"}' -H "Content-Type: application/json" localhost:8998/batches
2.查看状态( 有not_started starting idle running busy shutting_down error dead success 等状态)
localhost:8998/batches/3
结果:"id": 3,
"state": "dead"
4.livy 参数修改
(1) which can be changed with the
livy.server.port
config option 默认端口为8998,在livy.conf中可修改参数
(2) livy.yarn.jar
: this config has been replaced by separate configs listing specific archives for different Livy features. Refer to the default
livy.conf
file shipped with Livy for instructions.
//默认使用hiveContext
livy.repl.enableHiveContext = true
//开启用户代理
livy.impersonation.enabled = true
//设置session空闲过期时间
livy.server.session.timeout = 1h
{"name":"test",
"args":["2016-10-10 22:00:00"],
"proxyUser":"shilong",
"className":"com.test.livyJob",
"file":"/opt/jars/etl-livy.jar",
"jars":["/opt/jars/jar/ficus_2.10-1.0.1.jar","/opt/jars/jar/mysql-connector-java-5.1.39.jar"],//livy hdfs上面的的依赖jar 问题
"conf":{"driverMemory":"1g","driverCores":1,"executorCores":2,"executorMemory":"3g","numExecutors":2}
}
"args":["2016-10-10 22:00:00"],
"proxyUser":"shilong",
"className":"com.test.livyJob",
"file":"/opt/jars/etl-livy.jar",
"jars":["/opt/jars/jar/ficus_2.10-1.0.1.jar","/opt/jars/jar/mysql-connector-java-5.1.39.jar"],//livy hdfs上面的的依赖jar 问题
"conf":{"driverMemory":"1g","driverCores":1,"executorCores":2,"executorMemory":"3g","numExecutors":2}
}
Livy 提供的关键字参数
(16 known properties: "executorCores", "className", "conf", "driverMemory", "name", "driverCores", "pyFiles", "archives", "queue", "executorMemory", "files", "jars", "proxyUser", "numExecutors", "file" [truncated]]