SparkLauncher,LauncherServer,LauncherBackend的通信流程
一、SparkLauncher
sparkLauncher 是一个在代码里提交spark任务的类
这个类底层使用的依然是spark-submit脚本进行提交,通过ProcessBuilder 来设置相关环境参数调用
主要的方法有下面几个
- launch 提交一个任务,任务的提交输出结果如何由用户自己处理
- createBuilder launch和startApplication方法中调用生成ProcessBuilder执行shell脚本对象的方法
- startApplication 提交一个任务,并根据监听任务状态的改变来执行用户指定的listener
1.1、createBuilder
创建一个ProcessBuilder , 设置环境变量
private def createBuilder: ProcessBuilder {
var cmd = new ArrayList[String]
val script = if (isWindows) "spark-submit.cmd" else "spark-submit" //找到spark-submit对应的脚本
cmd.add(join(File.separator, builder.getSparkHome, "bin", script)) //设置脚本的绝对路径
cmd.addAll(builder.buildSparkSubmitArgs) //设置参数
// Since the child process is a batch script, let's quote things so that special characters are preserved,
// otherwise the batch interpreter will mess up the arguments. Batch scripts are weird.
if (isWindows) {
val winCmd = new ArrayList[String]
import scala.collection.JavaConversions._
for (arg <- cmd) {
winCmd.add(quoteForBatchScript(arg))
}
cmd = winCmd
}
//创建ProcessBuilder
// 并设置环境变量(sparkConf通过环境变量获取spark相关配置,所以把参数设置到环境变量中可以让sparkConf获取到)
val pb = new ProcessBuilder(cmd.toArray(new Array[String](cmd.size)))
import scala.collection.JavaConversions._
for (e <- builder.childEnv.entrySet) {
pb.environment.put(e.getKey, e.getValue)
}
if (workingDir != null) pb.directory(workingDir)
// Only one of redirectError and redirectError(...) can be specified.
// Similarly, if redirectToLog is specified, no other redirections should be specified.
checkState(!redirectErrorStream || errorStream == null, "Cannot specify both redirectError() and redirectError(...) ")
checkState(!redirectToLog || (!redirectErrorStream && errorStream == null && outputStream == null), "Cannot used redirectToLog() in conjunction with other redirection methods.")
if (redirectErrorStream || redirectToLog) pb.redirectErrorStream(true)
if (errorStream != null) pb.redirectError(errorStream)
if (outputStream != null) pb.redirectOutput(outputStream)
return pb
}
1.2、launch
启动一个子进程, 用于启动spark应用, 但是更加推荐使用startApplicaiton(SparkApppHandle.Listerer...)
来启动应用, 因为startApplication
提供一个Listener用用于监控应用的运行情况
/**
* Launches a sub-process that will start the configured Spark application.
* <p>
* The {@link #startApplication(SparkAppHandle.Listener...)} method is preferred when launching
* Spark, since it provides better control of the child application.
*
* @return A process handle for the Spark app.
*/
@throws[IOException]
def launch = {
// 通过ProcessBuilder来创建一个子进程
val childProc = createBuilder.start
if (redirectToLog) {
val loggerName = builder.getEffectiveConfig.get(CHILD_PROCESS_LOGGER_NAME)
new OutputRedirector(childProc.getInputStream, loggerName, REDIRECTOR_FACTORY)
}
childProc
}
1.3、startApplication
/**
* Starts a Spark application.
* <p>
* This method returns a handle that provides information about the running application and can
* be used to do basic interaction with it.
* <p>
* The returned handle assumes that the application will instantiate a single SparkContext
* during its lifetime. Once that context reports a final state (one that indicates the
* SparkContext has stopped), the handle will not perform new state transitions, so anything
* that happens after that cannot be monitored. If the underlying application is launched as
* a child process, {@link SparkAppHandle#kill()} can still be used to kill the child process.
* <p>
* Currently, all applications are launched as child processes. The child's stdout and stderr
* are merged and written to a logger (see <code>java.util.logging</code>) only if redirection
* has not otherwise been configured on this <code>SparkLauncher</code>. The logger's name can be
* defined by setting {@link #CHILD_PROCESS_LOGGER_NAME} in the app's configuration. If that
* option is not set, the code will try to derive a name from the application's name or main
* class / script file. If those cannot be determined, an internal, unique name will be used.
* In all cases, the logger name will start with "org.apache.spark.launcher.app", to fit more
* easily into the configuration of commonly-used logging systems.
*
* @since 1.6.0
* @param listeners Listeners to add to the handle before the app is launched.
* @return A handle for the launched application.
*/
public SparkAppHandle startApplication(SparkAppHandle.Listener... listeners) throws IOException {
// 通过LancherServer创建ChildProcAppHandle, 用于监控app的启动
ChildProcAppHandle handle = LauncherServer.newAppHandle();
//设置listener, 用于监控app的运行状态
for (SparkAppHandle.Listener l : listeners) {
handle.addListener(l);
}
String loggerName = builder.getEffectiveConfig().get(CHILD_PROCESS_LOGGER_NAME);
ProcessBuilder pb = createBuilder();
// 这里就是获取spark-submit标准输出数据的log文件名称,如果没有设置,就通过下面代码进行生成和获取,根据类名和包名自动生成
// Only setup stderr + stdout to logger redirection if user has not otherwise configured output redirection.
if (loggerName == null) {
String appName = builder.getEffectiveConfig().get(CHILD_PROCESS_LOGGER_NAME);
if (appName == null) {
if (builder.appName != null) {
appName = builder.appName;
} else if (builder.mainClass != null) {
int dot = builder.mainClass.lastIndexOf(".");
if (dot >= 0 && dot < builder.mainClass.length() - 1) {
appName = builder.mainClass.substring(dot + 1, builder.mainClass.length());
} else {
appName = builder.mainClass;
}
} else if (builder.appResource != null) {
appName = new File(builder.appResource).getName();
} else {
appName = String.valueOf(COUNTER.incrementAndGet());
}
}
String loggerPrefix = getClass().getPackage().getName(); //获取包名
loggerName = String.format("%s.app.%s", loggerPrefix, appName); //根据包名和appname生成log文件名
pb.redirectErrorStream(true);
}
//这里把LauncherServer的端口号通告到环境变量里,LauncherBackend就是通过环境变量获取LauncherServer的端口号进行通信的。
//LauncherBackend 的具体详情将会在其他部分进行记录
pb.environment().put(LauncherProtocol.ENV_LAUNCHER_PORT,
String.valueOf(LauncherServer.getServerInstance().getPort()));
//启动spark-submit 提交任务,并把标准log输出到之前设置的loggername里。
pb.environment().put(LauncherProtocol.ENV_LAUNCHER_SECRET, handle.getSecret());
try {
handle.setChildProc(pb.start(), loggerName);
} catch (IOException ioe) {
handle.kill();
throw ioe;
}
return handle;
}
二、LauncherServer
LauncherServer是一个用来接收LauncherBackend发送spark app状态变化的服务
当LauncherServer收到状态变化的信息后,会根据信息类型调用用户通过ChildProcAppHandle类注册进来的listener
核心方法为:
newAppHandle 注册一个新的ChildProcAppHandle
acceptConnections 服务执行的核心方法,用于监听连接,处理请求
ServerConnection.handle 处理请求内容的实际对象和方法
2.1、newAppHandle
/**
* 实现LauncherServer的单例对象,返回一个新的ChildProcAppHandle来使用户注册自己的listener
* Creates a handle for an app to be launched.
* This method will start a server if one hasn't been started yet.
* The server is shared for multiple handles, and once all handles are disposed of,
* the server is shut down.
*/
static synchronized ChildProcAppHandle newAppHandle() throws IOException {
//LauncherServer的创建
LauncherServer server = serverInstance != null ? serverInstance : new LauncherServer();
server.ref();
serverInstance = server;
//创建客户端的唯一身份标识
String secret = server.createSecret();
while (server.pending.containsKey(secret)) {
secret = server.createSecret();
}
//返回客户端对应的唯一ChildPrcAppHandle
return server.newAppHandle(secret);
}
2.2、LauncherServer的初始化
private LauncherServer() throws IOException {
this.refCount = new AtomicLong(0);
//建立了socket 监听服务端口,初始化内部线程执行核心代码
//后面查看LauncherBackend,会发现LauncherBackend每次启动都会先和LauncherServer建立来连接
ServerSocket server = new ServerSocket();
try {
server.setReuseAddress(true);
//在本地回环地址监听随机端口
server.bind(new InetSocketAddress(InetAddress.getLoopbackAddress(), 0));
this.clients = new ArrayList<>();
this.threadIds = new AtomicLong();
this.factory = new NamedThreadFactory(THREAD_NAME_FMT);
this.pending = new ConcurrentHashMap<>();
this.timeoutTimer = new Timer("LauncherServer-TimeoutTimer", true);
this.server = server;
this.running = true;
//初始化线程, acceptConnections
this.serverThread = factory.newThread(new Runnable() {
@Override
public void run() {
acceptConnections();
}
});
//启动线程
serverThread.start();
} catch (IOException ioe) {
close();
throw ioe;
} catch (Exception e) {
close();
throw new IOException(e);
}
}
2.3、LauncherServer的核心逻辑代码, 处理客户端的请求
// LauncherServer的核心逻辑代码, 处理客户端的请求
private void acceptConnections() {
try {
while (running) {
//等待连接
final Socket client = server.accept();
TimerTask timeout = new TimerTask() {
@Override
public void run() {
LOG.warning("Timed out waiting for hello message from client.");
try {
client.close();
} catch (IOException ioe) {
// no-op.
}
}
};
//每次接收一个新请求的时候就建立一个clientConnection在新线程中来处理这个请求
ServerConnection clientConnection = new ServerConnection(client, timeout);
Thread clientThread = factory.newThread(clientConnection);
synchronized (timeout) {
clientThread.start();
synchronized (clients) {
clients.add(clientConnection);
}
long timeoutMs = getConnectionTimeout();
// 0 is used for testing to avoid issues with clock resolution / thread scheduling,
// and force an immediate timeout.
if (timeoutMs > 0) {
timeoutTimer.schedule(timeout, getConnectionTimeout());
} else {
timeout.run();
}
}
}
} catch (IOException ioe) {
if (running) {
LOG.log(Level.SEVERE, "Error in accept loop.", ioe);
}
}
}
2.4、请求的消息, 具体由ServerConnection处理
private class ServerConnection extends LauncherConnection {
private TimerTask timeout;
private ChildProcAppHandle handle;
ServerConnection(Socket socket, TimerTask timeout) throws IOException {
super(socket);
this.timeout = timeout;
}
@Override
//处理请求的消息
//
protected void handle(Message msg) throws IOException {
try {
if (msg instanceof Hello) { //握手消息
timeout.cancel();
timeout = null;
Hello hello = (Hello) msg;
ChildProcAppHandle handle = pending.remove(hello.secret);
if (handle != null) {
handle.setConnection(this);
handle.setState(SparkAppHandle.State.CONNECTED);
this.handle = handle;
} else {
throw new IllegalArgumentException("Received Hello for unknown client.");
}
} else { // 更新状态
if (handle == null) {
throw new IllegalArgumentException("Expected hello, got: " +
msg != null ? msg.getClass().getName() : null);
}
if (msg instanceof SetAppId) {
SetAppId set = (SetAppId) msg;
//触发用户的listener
handle.setAppId(set.appId);
} else if (msg instanceof SetState) {
handle.setState(((SetState) msg).state);
} else {
throw new IllegalArgumentException("Invalid message: " +
msg != null ? msg.getClass().getName() : null);
}
}
} catch (Exception e) {
LOG.log(Level.INFO, "Error handling message from client.", e);
if (timeout != null) {
timeout.cancel();
}
close();
} finally {
timeoutTimer.purge();
}
}
@Override
// 销毁连接
public void close() throws IOException {
synchronized (clients) {
clients.remove(this);
}
super.close();
if (handle != null) {
if (!handle.getState().isFinal()) {
LOG.log(Level.WARNING, "Lost connection to spark application.");
handle.setState(SparkAppHandle.State.LOST);
}
handle.disconnect();
}
}
}
三、ChildProcAppHandle
ChildProcAppHandle 是用来保存和执行用户注册的listener的类,在LauncherServer中被调用
主要方法有
setAppId 当appid变化时触发listener
setState 当状态变化时触发listener
fireEvent 被setAppId和setState调用的方法,实际执行用户的listener在这里
addListener 注册用户的listener的方法,在SparkLauncher.startApplication中被调用
3.1、触发listener
private synchronized void fireEvent(boolean isInfoChanged) {
if (listeners != null) {
for (Listener l : listeners) {
if (isInfoChanged) {
l.infoChanged(this);
} else {
l.stateChanged(this);
}
}
}
}
四、LauncherBackend
LauncherBackend 是跟LauncherServer通信的客户端,向LauncherServer发送状态变化的通信端点
4.1、建立连接
def connect(): Unit = {
//连接LauncherServer的socket初始化动作,端口是从env中获取的,env里的端口是在SparkLauncher中通告出去的
val port = sys.env.get(LauncherProtocol.ENV_LAUNCHER_PORT).map(_.toInt)
//这里通过环境变量获取LauncherServer通信的唯一凭证
val secret = sys.env.get(LauncherProtocol.ENV_LAUNCHER_SECRET)
if (port != None && secret != None) {
/*
*这里建立跟LauncherServer通信的socket,ip是本地回环地址,
*因为只有通过SparkLauncher的startApplication的方式去提交spark 任务的时候LauncherServer才会在本地回环地址上建立监听
*因为SparkLauncher 通过ProcessBuilder的方式调用spark-submit,所以在spark-submit中会继承父进程的环境变量
*LauncherBackend才能通过环境变量确定是否存在LauncherServer服务
*/
val s = new Socket(InetAddress.getLoopbackAddress(), port.get)
// 封装与LauncherServer的连接, 在LauncherServer有对应的ServerConnection,接收连接
connection = new BackendConnection(s)
//发送握手包
connection.send(new Hello(secret.get, SPARK_VERSION))
//启动线程
clientThread = LauncherBackend.threadFactory.newThread(connection)
clientThread.start()
_isConnected = true
}
}