如何优雅的关闭kafka spark streaming应用-CSDN博客

为什么80%的码农都做不了架构师？>>>

引入问题

1.在spark streaming 中,如果消息抓取速度大于消费速度,这时候队列会有积压,如果这时候关闭了spark App,会导致队列中数据丢失

2.spark.streaming.stopGracefullyOnShutdown 这个参数在单机环境上有用,但是在yarn 集群上仍然没有用,即使yarn kill 是发送kill -15 信号量

1.初步思路

通过监控共享变量的形式, 由应用自身,去调用 JavaStreamingContext.stop(true,true),

使用自己编写的特定的脚本, 启动和关闭spark APP

2.共享变量的选择

a.redis

b.zookeeper

c.hdfs

3.具体实施

这里采用hdfs

monitor(groupId, path,()-> JavaStreamingContext.stop(true,true)).start();

private static Thread monitor(String groupId, String dirPath,Runnable task) {
    Thread t = new Thread(() -> {
        String path;
        if (dirPath.charAt(dirPath.length() - 1) == '/') {
            path = dirPath + groupId;
        } else {
            path = dirPath + "/" + groupId;
        }
        while (true) {
            String cmd = "hdfs dfs -ls " + path;
            try {
                Thread.sleep(5000L);
                Process process = Runtime.getRuntime().exec(cmd);
                InputStream in = process.getInputStream();
                BufferedReader br = new BufferedReader(new InputStreamReader(in, "utf-8"));
                String line = br.readLine();
                if (line == null || !line.contains(path)) {
                    logger.warn("flag file not found |stop Spark App by monitor thread !", line);
                    task.run();
                }
            } catch (InterruptedException e) {
                logger.error("monitor thread has been interrupt | {}", e.getMessage());
            } catch (Exception e) {
                logger.error("get flag file failed from hdfs | {}", cmd);
            }
        }
    }, groupId + "monitor_thread");
    t.setDaemon(true);
    return t;
}

转载于:https://my.oschina.net/ktlb/blog/897641