Spark as a Service之JobServer初测

最新推荐文章于 2019-09-09 16:46:57 发布

caoli98033

最新推荐文章于 2019-09-09 16:46:57 发布

阅读量1k

点赞数

分类专栏： scala 文章标签： scala

scala 专栏收录该内容

19 篇文章 0 订阅

订阅专栏

spark-jobserver提供了一个用于提交和管理Apache Spark作业(job)、jar文件和作业上下文（SparkContext）的RESTful接口。该项目位于git（https://github.com/ooyala/spark-jobserver），当前为0.4版本。

特性

“Spark as a Service”: 简单的面向job和context管理的REST接口
通过长期运行的job context支持亚秒级低延时作业(job)
可以通过结束context来停止运行的作业(job)
分割jar上传步骤以提高job的启动
异步和同步的job API，其中同步API对低延时作业非常有效
支持Standalone Spark和Mesos
Job和jar信息通过一个可插拔的DAO接口来持久化
命名RDD以缓存，并可以通过该名称获取RDD。这样可以提高作业间RDD的共享和重用

安装并启动jobServer

jobServer依赖sbt，所以必须先装好sbt。

 
   
 
 
  
         1 
       

         2 
       

         3 
       

         4 
       

         5 
       

         6 
       

         7 
       

         8 
       

         9 
       

         10 
       

         11 
       

         12 
       

         13 
       

         14 
       

         15 
       

         16 
       
 
        rpm 
          
        - 
        ivh  
        https 
        : 
        / 
        / 
        dl 
        .bintray 
        .com 
        / 
        sbt 
        / 
        rpm 
        / 
        sbt 
        - 
        0.13.6.rpm 
       
 
        yum  
        install  
        git 
       
 
        # 下面clone这个项目 
       
 
        SHELL 
        $ 
          
        git  
        clone 
          
        https 
        : 
        / 
        / 
        github 
        .com 
        / 
        ooyala 
        / 
        spark 
        - 
        jobserver 
        .git 
       
 
        # 在项目根目录下，进入sbt   
       
 
        SHELL 
        $ 
          
        sbt 
       
 
        . 
        . 
        . 
        . 
        . 
        . 
       
 
        [ 
        info 
        ] 
          
        Set  
        current  
        project  
        to 
          
        spark 
        - 
        jobserver 
        - 
        master 
          
        ( 
        in 
          
        build  
        file 
        : 
        / 
        D 
        : 
        / 
        Projects 
       
 
        / 
        spark 
        - 
        jobserver 
        - 
        master 
        / 
        ) 
       
 
        > 
       
 
        #在本地启动jobServer（开发者模式） 
       
 
        > 
        re 
        - 
        start 
          
        -- 
        - 
          
        - 
        Xmx4g 
       
 
        . 
        . 
        . 
        . 
        . 
        . 
       
 
        #此时会下载spark-core，jetty和liftweb等相关模块。 
       
 
        job 
        - 
        server  
        Starting  
        spark 
        .jobserver 
        .JobServer 
        .main 
        ( 
        ) 
       
 
        [ 
        success 
        ] 
          
        Total  
        time 
        : 
          
        545 
          
        s 
        , 
          
        completed 
          
        2014 
        - 
        10 
        - 
        21 
          
        19 
        : 
        19 
        : 
        48 
       
 
 

然后访问http://localhost:8090 可以看到Web UI

界面很垃圾，最好用命令行查看状态。

测试job执行

这里我们直接使用job-server的test包进行测试

编译完成后，将打包的jar文件通过REST接口上传
REST接口的API如下：
GET /jobs 查询所有job
POST /jobs 提交一个新job
GET /jobs/ 查询某一任务的结果和状态
GET /jobs//config

 
         1 
       
         2 
       
         3 
       
         4 
       
         5 
       
         6 
       
         7 
       
         8 
       
         9 
       
         10 
       
         11 
       
         12 
       
         13 
       
         14 
       
         15 
       
         16 
       
         17 
       
         18 
       
         19 
       
         20 
       
         21 
       
         22 
       
         23 
       
         24 
       
         25 
       
         26 
       
         27 
       
         28 
       
         29 
       
         30 
       
         31 
       
         32 
       
         33 
       
         34 
       
         35 
       
         36 
       
         37 
       
         38 
       
         39 
       
         40 
       
        SHELL 
        $ 
          
        curl 
          
        -- 
        data 
        - 
        binary 
          
        @ 
        job 
        - 
        server 
        - 
        tests 
        / 
        target 
        / 
        job 
        - 
        server 
        - 
        tests 
        - 
        0.4.0.jar 
          
        localhost 
        : 
        8090 
        / 
        jars 
        / 
        test 
       
        OK 
       
        # 查看提交的jar 
       
        SHELL 
        $ 
          
        curl  
        localhost 
        : 
        8090 
        / 
        jars 
        / 
       
        { 
       
        "test" 
        : 
          
        "2014-10-22T15:15:04.826+08:00" 
       
        } 
       
        # 提交job 
       
         提交的 
        appName为 
        test， 
        class为 
        spark 
        .jobserver 
        .WordCountExample 
       
        SHELL 
        $ 
           
        curl 
          
        - 
        d 
          
        "input.string = hello job server" 
          
        'localhost:8090/jobs?appName=test&classPath=spark.jobserver.WordCountExample' 
       
        { 
       
        "status" 
        : 
          
        "STARTED" 
        , 
       
        "result" 
        : 
          
        { 
       
        "jobId" 
        : 
          
        "34ce0666-0148-46f7-8bcf-a7a19b5608b2" 
        , 
       
        "context" 
        : 
          
        "eba36388-spark.jobserver.WordCountExample" 
       
        } 
       
        } 
       
        # 通过job-id查看结果和配置信息 
       
        SHELL 
        $ 
          
        curl  
        localhost 
        : 
        8090 
        / 
        jobs 
        / 
        34ce0666 
        - 
        0148 
        - 
        46f7 
        - 
        8bcf 
        - 
        a7a19b5608b2 
       
        { 
       
        "status" 
        : 
          
        "OK" 
        , 
       
        "result" 
        : 
          
        { 
       
        "job" 
        : 
          
        1 
        , 
       
        "hello" 
        : 
          
        1 
        , 
       
        "server" 
        : 
          
        1 
       
        } 
       
        SHELL 
        $ 
          
        curl  
        localhost 
        : 
        8090 
        / 
        jobs 
        / 
        34ce0666 
        - 
        0148 
        - 
        46f7 
        - 
        8bcf 
        - 
        a7a19b5608b2 
        / 
        config 
       
        { 
       
        "input" 
          
        : 
          
        { 
       
        "string" 
          
        : 
          
        "hello job server" 
       
        } 
       
        # 提交一个同步的job，当执行命令后，terminal会hang住直到任务执行完毕。 
       
        SHELL 
        $ 
          
        curl 
          
        - 
        d 
          
        "input.string = hello job server" 
          
        'localhost:8090/jobs?appName=test&classPath=spark.jobserver.WordCountExample' 
        & 
        sync 
        = 
        true 
       
        { 
       
        "status" 
        : 
          
        "OK" 
        , 
       
        "result" 
        : 
          
        { 
       
        "job" 
        : 
          
        1 
        , 
       
        "hello" 
        : 
          
        1 
        , 
       
        "server" 
        : 
          
        1 
       
        }

预先启动Context

和Context相关的API
GET /contexts 查询所有预先建立好的context
POST /contexts 建立新的context
DELETE /contexts/ 删除此context，停止运行于此context上的所有job

 
         1 
       
         2 
       
         3 
       
         4 
       
         5 
       
         6 
       
         7 
       
         8 
       
         9 
       
         10 
       
         11 
       
         12 
       
         13 
       
         14 
       
         15 
       
        SHELL 
        $ 
          
        curl 
          
        - 
        d 
          
        "" 
          
        'localhost:8090/contexts/test-context?num-cpu-cores=4&mem-per-node=512m' 
       
        OK 
       
        # 查看现有的context 
       
        curl  
        localhost 
        : 
        8090 
        / 
        contexts 
       
        [ 
        "test-context" 
        , 
          
        "feceedc3-spark.jobserver.WordCountExample" 
        ] 
       
         接下来在这个 
        context上执行 
        job 
       
        curl 
          
        - 
        d 
          
        "input.string = a b c a b see" 
          
        'localhost:8090/jobs?appName=test&classPath=spark.jobserver.WordCountExample&context=test-context&sync=true' 
       
        { 
       
        "status" 
        : 
          
        "OK" 
        , 
       
        "result" 
        : 
          
        { 
       
        "a" 
        : 
          
        2 
        , 
       
        "b" 
        : 
          
        2 
        , 
       
        "c" 
        : 
          
        1 
        , 
       
        "see" 
        : 
          
        1 
       
        }

配置文件

打开配置文件，可以发现master设置为local[4],可以将其改为我们的集群地址。

 
         1 
       
         2 
       
        vim  
        spark 
        - 
        jobserver 
        / 
        config 
        / 
        local 
        . 
        conf 
        . 
        template 
       
        master 
          
        = 
          
        "local[4]"

此外，关于数据对象的存储方法和路径：

默认context设置，该设置可以被
下面再次在sbt中启动REST接口的中的参数覆盖。

 
         1 
       
         2 
       
         3 
       
         4 
       
         5 
       
         6 
       
         7 
       
         8 
       
         9 
       
         10 
       
         11 
       
        # universal context configuration.  These settings can be overridden, see README.md 
       
        context 
        - 
        settings 
          
        { 
       
        num 
        - 
        cpu 
        - 
        cores 
          
        = 
          
        2 
                    
        # Number of cores to allocate.  Required. 
       
        memory 
        - 
        per 
        - 
        node 
          
        = 
          
        512m 
                  
        # Executor memory per node, -Xmx style eg 512m, #1G, etc. 
       
        # in case spark distribution should be accessed from HDFS (as opposed to being installed on every mesos slave) 
       
        # spark.executor.uri = "hdfs://namenode:8020/apps/spark/spark.tgz" 
       
        # uris of jars to be loaded into the classpath for this context 
       
        # dependent-jar-uris = ["file:///some/path/present/in/each/mesos/slave/somepackage.jar"] 
       
        }

基本的使用到此为止，jobServer的部署和项目使用将之后介绍。

caoli98033

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Spark as a Service之JobServer初测

spark-jobserver提供了一个用于提交和管理Apache Spark作业(job)、jar文件和作业上下文（SparkContext）的RESTful接口。该项目位于git（https://github.com/ooyala/spark-jobserver），当前为0.4版本。特性“Spark as a Service”: 简单的面向job和context管理的REST接
复制链接

扫一扫

专栏目录