springboot创建spark-submit服务

最新推荐文章于 2024-08-05 11:18:55 发布

AtongWood

最新推荐文章于 2024-08-05 11:18:55 发布

阅读量6.7k

点赞数 3

本文链接：https://blog.csdn.net/laksdbaksjfgba/article/details/86023892

版权

折腾专栏收录该内容

1 篇文章 0 订阅

订阅专栏

背景

一直很好奇web后台如何启动Spark应用程序，查找Api后发现可以使用org.apache.spark.launcher.SparkLauncher来做到这一点。我想得动手测试一下，而且要做的体面一些，所以搞个简易的web工程吧，顺便学习熟悉一下使用springboot框架。在这里将整个折腾的过程记录下来，新手上路，有任何搞错的地方，或者走了弯路，还请大家不吝指出，帮我进步。

准备工作

1. 搭建hadoop集群，我这边用的是两台主机的分布式集群
2. 安装Spark，测试能运行spark-submit即可，然后配置好HistoryServer
3. 安装Mysql，创建一个Spark应用信息表，只有mainClass和jarPath两个字段
4. 熟悉Springboot框架的基本使用

主要流程

我设想主要有三个html页面：

1. 查询已经开发好的spark应用（应用信息提前入到数据库里）
2. 设置执行参数后提交（参数包括mainclass、jar包路径、driver内存、executor内存等）
3. 显示应用执行结果

效果截图

1. 查询Spark应用，点击应用进入submit页面

2. 执行参数设置

3. 提交应用程序，正在执行中

4. 执行结束后跳转，查看执行结果。点击Tracking URL会跳转到Yarn的Application管理
页面，还能查看Spark应用的job信息。

主要代码

搭建一个springboot项目，配置依赖DevTools + web + thymeleaf + mysql + mybatis
DevTools模块使Spring Boot应用支持热部署，提高开发者的开发效率，修改后无需手动重启Spring Boot应用。可以先不配，需要用的时候再说。
Spark应用信息表，只有三个字段：mainClass是应用程序的main方法，jarPath是jar包存放路径，note是应用说明

实体类
这里只用到两个实体类：Spark应用信息AppInfo和Spark应用执行参数SparkAppPara

public class AppInfo {
	String mainClass;//应用程序的mainClass
	String jarPath;//应用程序jar包的存放位置，可以是本地或HDFS
	String note;//应用说明
	//省略getter和setter
}

public class SparkAppPara {
	String mainClass;
	String jarPath;
	String master;//可以是Yarn或StandAlone
	String deployMode;//可以是Cluster或Client
	String driverMemory ;//driver内存
	String executorMemory;//executor内存
	String executorInstances;//executor个数
	String executorCores;//executor核数
	String defaultParallelism;//参数spark.default.parallelism的值
	//省略getter和setter
}

Controller
（1）访问应用信息页面

@RequestMapping("/appInfo")
public String appInfo(){
	return "appInfo";
}

（2）查询Spark应用信息

@RequestMapping("/getAllAppInfo")
@ResponseBody
public String getAllAppInfo(){
    return sparkAppInfoService.getAllAppInfo();
}

（3）点击某个应用，跳转到提交页面

@RequestMapping("/submitApp")
public ModelAndView submitApp(String mainClass,String jarPath){
	ModelAndView mav = new ModelAndView();
	mav.setViewName("submitApp");
	mav.addObject("mainClass",mainClass);
	mav.addObject("jarPath",jarPath);
	return mav;
}

这里我希望跳转之后，自动填写mainClass和jarPath，我的做法是把这俩参数通过后台转给新页面。由于页面不是jsp，所以不能用el表达式获取model值。需要靠Thymeleaf的语法th:xxx=${…}来获取渲染数据。

<div class="icon">
	<label class="cd-label" for="mainClass">mainClass</label>
	<input class="mainClass" type="text" name="mainClass" id="mainClass" th:value=${mainClass}>
</div> 

<div class="icon">
	<label class="cd-label" for="jarPath">jarPath</label>
	<input class="jarPath" type="text" name="jarPath" id="jarPath" th:value=${jarPath}>
</div>

（4）提交任务

@RequestMapping(value = "/submit")
@ResponseBody
public String Submit(@RequestBody SparkAppPara sparkAppPara) throws IOException, InterruptedException {
   return submitService.submitApp(sparkAppPara);
}

（5）执行完后跳转到结果页面
在这里我希望拿到执行结果json之后，跳转到结果页面展示。我的做法是在Ajax请求成功后带参数跳转页面，我觉得肯定有更好的办法，在此抛砖引玉。

success: function(data)
{
    window.location.href=host+'/result?resultJson='+ encodeURIComponent(data);
}

因为url请求里不能有大小括号等特殊字符，所以请求之前需要使用encodeURIComponent方法进行编码。

@RequestMapping("/result")
public ModelAndView toResult(String resultJson){
   ModelAndView mav = new ModelAndView();
   mav.setViewName("result");
   mav.addObject("resultJson",resultJson);
   return mav;
}

关于在结果页面的JS代码里获取resultJson：
第（3）步中，Thymeleaf直接把model值渲染到html标签中。而在结果页面中，我需要先拿到resultJson，进行一些处理后再渲染。在JS代码里，我们可以像下面这样来获取resultJson。

<script th:inline="javascript">
    var resultJson = JSON.parse([[${resultJson}]]);

    $("#trackingUrl").attr("href",yarnAppUrl+resultJson.id);
    $("#applicationId").html(resultJson.id);
    $("#applicationName").html(resultJson.name);
	//次要代码省略
</script>

这里需要注意的是，这部分JS代码只能内嵌在html页面中，外联JS中不会生效。

Service和Mapper
（1）获取Spark应用信息的Service和Mapper

@Service
public class SparkAppInfoService {
    @Autowired
    private AppInfoMapper appInfo;

    public String getAllAppInfo(){
        List<AppInfo> list = appInfo.getAllAppInfo();
        return JSONObject.toJSONString(list);
    }
}

@Component
public interface AppInfoMapper {
    @Select("SELECT * FROM appinfo")
    @Results({
            @Result(property = "mainClass",  column = "mainclass"),
            @Result(property = "jarPath", column = "jarpath"),
            @Result(property = "note", column = "note")
    })
    List<AppInfo> getAllAppInfo();
}

（2）提交Spark应用的Service
提交spark应用的API不止一种，我用的是org.apache.spark.launcher.SparkLauncher

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-launcher_2.12</artifactId>
    <version>2.4.0</version>
</dependency>

@Service
public class SparkSubmitService {

    public String submitApp(SparkAppPara sparkAppPara) throws IOException, InterruptedException {
        HashMap env = new HashMap();
        //这两个属性必须设置
        env.put("HADOOP_CONF_DIR", "/usr/local/hadoop/etc/hadoop");
        env.put("JAVA_HOME", "/usr/lib/jdk/jdk1.8.0_191/");
        CountDownLatch countDownLatch = new CountDownLatch(1);
        SparkAppHandle handle = new SparkLauncher(env)
           .setSparkHome("/usr/local/spark/")
            .setAppResource(sparkAppPara.getJarPath())
            .setMainClass(sparkAppPara.getMainClass())
            .setMaster(sparkAppPara.getMaster())
            .setDeployMode(sparkAppPara.getDeployMode())
            .setConf("spark.driver.memory", sparkAppPara.getDriverMemory()+"g")
            .setConf("spark.executor.memory", sparkAppPara.getExecutorMemory()+"g")
            .setConf("spark.executor.instances", sparkAppPara.getExecutorInstances())
            .setConf("spark.executor.cores", sparkAppPara.getExecutorCores())
            .setConf("spark.default.parallelism", sparkAppPara.getDefaultParallelism())
            .setVerbose(true).startApplication(new SparkAppHandle.Listener() {
               @Override
                public void stateChanged(SparkAppHandle sparkAppHandle) {
                    if (sparkAppHandle.getState().isFinal()) {
                        countDownLatch.countDown();
                    }
                    System.out.println("state:" + sparkAppHandle.getState().toString());
                }

                @Override
                public void infoChanged(SparkAppHandle sparkAppHandle) {
                    System.out.println("Info:" + sparkAppHandle.getState().toString());
                }
            });
        System.out.println("The task is executing, please wait ....");
        //线程等待任务结束
        countDownLatch.await();
        System.out.println("The task is finished!");
		//通过Spark原生的监测api获取执行结果信息
        String restUrl = "http://master:18080/api/v1/applications/"+handle.getAppId();
        String resultJson = RestUtil.httpGet(restUrl,null);

        return resultJson;
    }
}

Http请求工具
我们使用这个工具，发送rest请求，就可以获取Spark应用执行结果的json信息（我觉得有一个前提是需要配置好History Server服务并启动）。

public class RestUtil {
    public static String httpGet(String urlStr, List<String> urlParam) throws IOException, InterruptedException {
        // 实例一个URL资源
        URL url = new URL(urlStr);	
        HttpURLConnection connet = null;
        int i = 0;
        while(connet==null || connet.getResponseCode() != 200 ){
            connet = (HttpURLConnection) url.openConnection();
            connet.setRequestMethod("GET");
            connet.setRequestProperty("Charset", "UTF-8");
            connet.setRequestProperty("Content-Type", "application/json");
            connet.setConnectTimeout(15000);// 连接超时 单位毫秒
            connet.setReadTimeout(15000);// 读取超时 单位毫秒
            i++;
            if (i==50)break;
            Thread.sleep(500);
        }
        //将返回的值存入到String中
        BufferedReader brd = new BufferedReader(new InputStreamReader(connet.getInputStream(),"UTF-8"));
        StringBuilder  sb  = new StringBuilder();
        String line;
        while((line = brd.readLine()) != null){
            sb.append(line);
        }
        brd.close();
        connet.disconnect();
        return sb.toString();
    }
}

外部引用

项目里引用的第三方模板和插件如下，如有侵权请联系我删除。

应用查询页面——https://www.lanrenzhijia.com/others/6564.html
任务提交页面——https://www.lanrenzhijia.com/jquery/3981.html
ajax异步请求等待特效——http://www.jq22.com/jquery-info15050

参考资料

https://blog.csdn.net/sparkexpert/article/details/51045762
https://blog.csdn.net/u011244682/article/details/79170134
http://spark.apache.org/docs/latest/monitoring.html