LauncherBackend与LauncherServer

SparkLauncher,LauncherServer,LauncherBackend的通信流程

在这里插入图片描述

一、SparkLauncher

sparkLauncher 是一个在代码里提交spark任务的类
这个类底层使用的依然是spark-submit脚本进行提交,通过ProcessBuilder 来设置相关环境参数调用
主要的方法有下面几个

  • launch 提交一个任务,任务的提交输出结果如何由用户自己处理
  • createBuilder launch和startApplication方法中调用生成ProcessBuilder执行shell脚本对象的方法
  • startApplication 提交一个任务,并根据监听任务状态的改变来执行用户指定的listener

1.1、createBuilder

创建一个ProcessBuilder , 设置环境变量

        private def createBuilder: ProcessBuilder {
            var cmd = new ArrayList[String]
            val script = if (isWindows) "spark-submit.cmd" else "spark-submit"  //找到spark-submit对应的脚本
            cmd.add(join(File.separator, builder.getSparkHome, "bin", script))   //设置脚本的绝对路径
            cmd.addAll(builder.buildSparkSubmitArgs)						//设置参数
            // Since the child process is a batch script, let's quote things so that special characters are preserved, 
            // otherwise the batch interpreter will mess up the arguments. Batch scripts are weird.
            if (isWindows) {
                val winCmd = new ArrayList[String]
                import scala.collection.JavaConversions._
                for (arg <- cmd) {
                    winCmd.add(quoteForBatchScript(arg))
                }
                cmd = winCmd
            }
            
			//创建ProcessBuilder
			// 并设置环境变量(sparkConf通过环境变量获取spark相关配置,所以把参数设置到环境变量中可以让sparkConf获取到)
            val pb = new ProcessBuilder(cmd.toArray(new Array[String](cmd.size)))
			
            import scala.collection.JavaConversions._
            for (e <- builder.childEnv.entrySet) {
                pb.environment.put(e.getKey, e.getValue)
            }
            if (workingDir != null) pb.directory(workingDir)
            // Only one of redirectError and redirectError(...) can be specified.
            // Similarly, if redirectToLog is specified, no other redirections should be specified.
            checkState(!redirectErrorStream || errorStream == null, "Cannot specify both redirectError() and redirectError(...) ")
            checkState(!redirectToLog || (!redirectErrorStream && errorStream == null && outputStream == null), "Cannot used redirectToLog() in conjunction with other redirection methods.")
            if (redirectErrorStream || redirectToLog) pb.redirectErrorStream(true)
            if (errorStream != null) pb.redirectError(errorStream)
            if (outputStream != null) pb.redirectOutput(outputStream)
            return pb
        }

1.2、launch

启动一个子进程, 用于启动spark应用, 但是更加推荐使用startApplicaiton(SparkApppHandle.Listerer...)来启动应用, 因为startApplication
提供一个Listener用用于监控应用的运行情况


        /**
         * Launches a sub-process that will start the configured Spark application.
         * <p>
         * The {@link #startApplication(SparkAppHandle.Listener...)} method is preferred when launching
         * Spark, since it provides better control of the child application.
         *
         * @return A process handle for the Spark app.
         */
        @throws[IOException]
        def launch = {
        	// 通过ProcessBuilder来创建一个子进程
            val childProc = createBuilder.start
            if (redirectToLog) {
                val loggerName = builder.getEffectiveConfig.get(CHILD_PROCESS_LOGGER_NAME)
                new OutputRedirector(childProc.getInputStream, loggerName, REDIRECTOR_FACTORY)
            }
            childProc
        }

1.3、startApplication

    /**
     * Starts a Spark application.
     * <p>
     * This method returns a handle that provides information about the running application and can
     * be used to do basic interaction with it.
     * <p>
     * The returned handle assumes that the application will instantiate a single SparkContext
     * during its lifetime. Once that context reports a final state (one that indicates the
     * SparkContext has stopped), the handle will not perform new state transitions, so anything
     * that happens after that cannot be monitored. If the underlying application is launched as
     * a child process, {@link SparkAppHandle#kill()} can still be used to kill the child process.
     * <p>
     * Currently, all applications are launched as child processes. The child's stdout and stderr
     * are merged and written to a logger (see <code>java.util.logging</code>) only if redirection
     * has not otherwise been configured on this <code>SparkLauncher</code>. The logger's name can be
     * defined by setting {@link #CHILD_PROCESS_LOGGER_NAME} in the app's configuration. If that
     * option is not set, the code will try to derive a name from the application's name or main
     * class / script file. If those cannot be determined, an internal, unique name will be used.
     * In all cases, the logger name will start with "org.apache.spark.launcher.app", to fit more
     * easily into the configuration of commonly-used logging systems.
     *
     * @since 1.6.0
     * @param listeners Listeners to add to the handle before the app is launched.
     * @return A handle for the launched application.
     */
    public SparkAppHandle startApplication(SparkAppHandle.Listener... listeners) throws IOException {

        // 通过LancherServer创建ChildProcAppHandle, 用于监控app的启动
        ChildProcAppHandle handle = LauncherServer.newAppHandle();
        //设置listener, 用于监控app的运行状态
        for (SparkAppHandle.Listener l : listeners) {
            handle.addListener(l);
        }

        String loggerName = builder.getEffectiveConfig().get(CHILD_PROCESS_LOGGER_NAME);
        ProcessBuilder pb = createBuilder();

        //  这里就是获取spark-submit标准输出数据的log文件名称,如果没有设置,就通过下面代码进行生成和获取,根据类名和包名自动生成
        // Only setup stderr + stdout to logger redirection if user has not otherwise configured output redirection.
        if (loggerName == null) {
            String appName = builder.getEffectiveConfig().get(CHILD_PROCESS_LOGGER_NAME);
            if (appName == null) {
                if (builder.appName != null) {
                    appName = builder.appName;
                } else if (builder.mainClass != null) {
                    int dot = builder.mainClass.lastIndexOf(".");
                    if (dot >= 0 && dot < builder.mainClass.length() - 1) {
                        appName = builder.mainClass.substring(dot + 1, builder.mainClass.length());
                    } else {
                        appName = builder.mainClass;
                    }
                } else if (builder.appResource != null) {
                    appName = new File(builder.appResource).getName();
                } else {
                    appName = String.valueOf(COUNTER.incrementAndGet());
                }
            }
            String loggerPrefix = getClass().getPackage().getName();		//获取包名
            loggerName = String.format("%s.app.%s", loggerPrefix, appName);   //根据包名和appname生成log文件名
            pb.redirectErrorStream(true);
        }

		//这里把LauncherServer的端口号通告到环境变量里,LauncherBackend就是通过环境变量获取LauncherServer的端口号进行通信的。
	   //LauncherBackend 的具体详情将会在其他部分进行记录
        pb.environment().put(LauncherProtocol.ENV_LAUNCHER_PORT,
                String.valueOf(LauncherServer.getServerInstance().getPort()));
		
		//启动spark-submit 提交任务,并把标准log输出到之前设置的loggername里。
        pb.environment().put(LauncherProtocol.ENV_LAUNCHER_SECRET, handle.getSecret());
        try {
            handle.setChildProc(pb.start(), loggerName);
        } catch (IOException ioe) {
            handle.kill();
            throw ioe;
        }

        return handle;
    }

二、LauncherServer

LauncherServer是一个用来接收LauncherBackend发送spark app状态变化的服务
当LauncherServer收到状态变化的信息后,会根据信息类型调用用户通过ChildProcAppHandle类注册进来的listener
核心方法为:
    newAppHandle                    注册一个新的ChildProcAppHandle
    acceptConnections              服务执行的核心方法,用于监听连接,处理请求
    ServerConnection.handle      处理请求内容的实际对象和方法

2.1、newAppHandle

    /**
     * 实现LauncherServer的单例对象,返回一个新的ChildProcAppHandle来使用户注册自己的listener
     * Creates a handle for an app to be launched.
     * This method will start a server if one hasn't been started yet.
     * The server is shared for multiple handles, and once all handles are disposed of,
     * the server is shut down.
     */
    static synchronized ChildProcAppHandle newAppHandle() throws IOException {
        //LauncherServer的创建
        LauncherServer server = serverInstance != null ? serverInstance : new LauncherServer();
        server.ref();
        serverInstance = server;

        //创建客户端的唯一身份标识
        String secret = server.createSecret();
        while (server.pending.containsKey(secret)) {
            secret = server.createSecret();
        }

        //返回客户端对应的唯一ChildPrcAppHandle
        return server.newAppHandle(secret);
    }

2.2、LauncherServer的初始化

    private LauncherServer() throws IOException {
        this.refCount = new AtomicLong(0);

        //建立了socket 监听服务端口,初始化内部线程执行核心代码
        //后面查看LauncherBackend,会发现LauncherBackend每次启动都会先和LauncherServer建立来连接
        ServerSocket server = new ServerSocket();
        try {
            server.setReuseAddress(true);
            //在本地回环地址监听随机端口
            server.bind(new InetSocketAddress(InetAddress.getLoopbackAddress(), 0));

            this.clients = new ArrayList<>();
            this.threadIds = new AtomicLong();
            this.factory = new NamedThreadFactory(THREAD_NAME_FMT);
            this.pending = new ConcurrentHashMap<>();
            this.timeoutTimer = new Timer("LauncherServer-TimeoutTimer", true);
            this.server = server;
            this.running = true;

            //初始化线程, acceptConnections
            this.serverThread = factory.newThread(new Runnable() {
                @Override
                public void run() {
                    acceptConnections();
                }
            });

            //启动线程
            serverThread.start();
        } catch (IOException ioe) {
            close();
            throw ioe;
        } catch (Exception e) {
            close();
            throw new IOException(e);
        }
    }

2.3、LauncherServer的核心逻辑代码, 处理客户端的请求

    // LauncherServer的核心逻辑代码, 处理客户端的请求
    private void acceptConnections() {
        try {
            while (running) {
                //等待连接
                final Socket client = server.accept();
                TimerTask timeout = new TimerTask() {
                    @Override
                    public void run() {
                        LOG.warning("Timed out waiting for hello message from client.");
                        try {
                            client.close();
                        } catch (IOException ioe) {
                            // no-op.
                        }
                    }
                };

                //每次接收一个新请求的时候就建立一个clientConnection在新线程中来处理这个请求
                ServerConnection clientConnection = new ServerConnection(client, timeout);
                Thread clientThread = factory.newThread(clientConnection);
                synchronized (timeout) {
                    clientThread.start();
                    synchronized (clients) {
                        clients.add(clientConnection);
                    }
                    long timeoutMs = getConnectionTimeout();
                    // 0 is used for testing to avoid issues with clock resolution / thread scheduling,
                    // and force an immediate timeout.
                    if (timeoutMs > 0) {
                        timeoutTimer.schedule(timeout, getConnectionTimeout());
                    } else {
                        timeout.run();
                    }
                }
            }
        } catch (IOException ioe) {
            if (running) {
                LOG.log(Level.SEVERE, "Error in accept loop.", ioe);
            }
        }
    }

2.4、请求的消息, 具体由ServerConnection处理


    private class ServerConnection extends LauncherConnection {

        private TimerTask timeout;
        private ChildProcAppHandle handle;

        ServerConnection(Socket socket, TimerTask timeout) throws IOException {
            super(socket);
            this.timeout = timeout;
        }

        @Override
        //处理请求的消息
        //
        protected void handle(Message msg) throws IOException {
            try {
                if (msg instanceof Hello) { //握手消息
                    timeout.cancel();
                    timeout = null;
                    Hello hello = (Hello) msg;
                    ChildProcAppHandle handle = pending.remove(hello.secret);
                    if (handle != null) {
                        handle.setConnection(this);
                        handle.setState(SparkAppHandle.State.CONNECTED);
                        this.handle = handle;
                    } else {
                        throw new IllegalArgumentException("Received Hello for unknown client.");
                    }
                } else {    // 更新状态
                    if (handle == null) {
                        throw new IllegalArgumentException("Expected hello, got: " +
                                msg != null ? msg.getClass().getName() : null);
                    }
                    if (msg instanceof SetAppId) {
                        SetAppId set = (SetAppId) msg;
                        //触发用户的listener
                        handle.setAppId(set.appId);
                    } else if (msg instanceof SetState) {
                        handle.setState(((SetState) msg).state);
                    } else {
                        throw new IllegalArgumentException("Invalid message: " +
                                msg != null ? msg.getClass().getName() : null);
                    }
                }
            } catch (Exception e) {
                LOG.log(Level.INFO, "Error handling message from client.", e);
                if (timeout != null) {
                    timeout.cancel();
                }
                close();
            } finally {
                timeoutTimer.purge();
            }
        }

        @Override
        // 销毁连接
        public void close() throws IOException {
            synchronized (clients) {
                clients.remove(this);
            }
            super.close();
            if (handle != null) {
                if (!handle.getState().isFinal()) {
                    LOG.log(Level.WARNING, "Lost connection to spark application.");
                    handle.setState(SparkAppHandle.State.LOST);
                }
                handle.disconnect();
            }
        }

    }

三、ChildProcAppHandle

ChildProcAppHandle 是用来保存和执行用户注册的listener的类,在LauncherServer中被调用
主要方法有
    setAppId               当appid变化时触发listener
    setState                当状态变化时触发listener
    fireEvent               被setAppId和setState调用的方法,实际执行用户的listener在这里
    addListener           注册用户的listener的方法,在SparkLauncher.startApplication中被调用
3.1、触发listener
    private synchronized void fireEvent(boolean isInfoChanged) {
        if (listeners != null) {
            for (Listener l : listeners) {
                if (isInfoChanged) {
                    l.infoChanged(this);
                } else {
                    l.stateChanged(this);
                }
            }
        }
    }

四、LauncherBackend

LauncherBackend 是跟LauncherServer通信的客户端,向LauncherServer发送状态变化的通信端点

4.1、建立连接

    def connect(): Unit = {

        //连接LauncherServer的socket初始化动作,端口是从env中获取的,env里的端口是在SparkLauncher中通告出去的
        val port = sys.env.get(LauncherProtocol.ENV_LAUNCHER_PORT).map(_.toInt)
        //这里通过环境变量获取LauncherServer通信的唯一凭证
        val secret = sys.env.get(LauncherProtocol.ENV_LAUNCHER_SECRET)
        if (port != None && secret != None) {
            /*
             *这里建立跟LauncherServer通信的socket,ip是本地回环地址,
             *因为只有通过SparkLauncher的startApplication的方式去提交spark 任务的时候LauncherServer才会在本地回环地址上建立监听
             *因为SparkLauncher 通过ProcessBuilder的方式调用spark-submit,所以在spark-submit中会继承父进程的环境变量
             *LauncherBackend才能通过环境变量确定是否存在LauncherServer服务
            */
            val s = new Socket(InetAddress.getLoopbackAddress(), port.get)

            // 封装与LauncherServer的连接, 在LauncherServer有对应的ServerConnection,接收连接
            connection = new BackendConnection(s)
            //发送握手包
            connection.send(new Hello(secret.get, SPARK_VERSION))

            //启动线程
            clientThread = LauncherBackend.threadFactory.newThread(connection)
            clientThread.start()
            _isConnected = true
        }
    }
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值