近期,在研究mesos 0.10.0版本的代码,对于mesos代码结构又有了一部分新的认识。在阅读有关executor的那部分代码时,感觉其中有一部分代码很基础,但是,开发很巧妙的拆分了代码的结构,避免了在开发代码时带来的问题。首先是代码中的一部分注释如下:
/**
* Concrete implementation of an ExecutorDriver that connects an
* Executor with a Mesos slave. The MesosExecutorDriver is thread-safe.
*
* The driver is responsible for invoking the Executor callbacks as it
* communicates with the Mesos slave.
*
* Note that blocking on the MesosExecutorDriver (e.g., via
* MesosExecutorDriver::join) doesn't affect the executor callbacks in
* anyway because they are handled by a different thread.
*
* See src/examples/test_executor.cpp for an example of using the
* MesosExecutorDriver.
*/
最初引起我注意的是其中的thread-safe,很好奇这里的thread-safe是怎么实现的,然后就看了一下下面实现的MesosExecutorDriver的声明代码,如下:
class MesosExecutorDriver : public ExecutorDriver
{
public:
/**
* Creates a new driver that uses the specified Executor. Note, the
* executor pointer must outlive the driver.
*/
MesosExecutorDriver(Executor* executor);
/**
* This destructor will block indefinitely if
* MesosExecutorDriver::start was invoked successfully (possibly via
* MesosExecutorDriver::run) and MesosExecutorDriver::stop has not
* been invoked.
*/
virtual ~MesosExecutorDriver();
/**
* See ExecutorDriver for descriptions of these.
*/
virtual Status start();
virtual Status stop();
virtual Status abort();
virtual Status join();
virtual Status run();
virtual Status sendStatusUpdate(const TaskStatus& status);
virtual Status sendFrameworkMessage(const std::string& data);
private:
friend class internal::ExecutorProcess;
Executor* executor;
// Libprocess process for communicating with slave.
internal::ExecutorProcess* process;
// Mutex to enforce all non-callbacks are execute serially.
pthread_mutex_t mutex;
// Condition variable for waiting until driver terminates.
pthread_cond_t cond;
// Current status of the driver.
Status status;
};
先看一下MesosExecutorDriver这个类中的成员变量,发现其中有pthread_cond_t
类型的变量cond
和 pthread_mutex_t
类型的变量mutex
,貌似大概知道这个thread-safe是这怎么实现的了,自己曾经写过这样的代码,代码的大体结构还有印象。还有一个Status
类型的变量status
用来保存MesosExecutorDriver的状态,发现在mesos中的很多类的设计中都会用枚举类型来声明多种状态,然后在类中设置变量来保存状态,不知道是在分部式场景下特有的这种设计方法,还是在其他场景下也会使用这种设计方法。接下来看看到底是怎么样实现的,MesosExecutorDriver类的实现代码如下:
MesosExecutorDriver::MesosExecutorDriver(Executor* _executor)
: executor(_executor), status(DRIVER_NOT_STARTED), process(NULL)
{
GOOGLE_PROTOBUF_VERIFY_VERSION;
// Create mutex and condition variable
pthread_mutexattr_t attr;
pthread_mutexattr_init(&attr);
pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE);
pthread_mutex_init(&mutex, &attr);
pthread_mutexattr_destroy(&attr);
pthread_cond_init(&cond, 0);
// Initialize libprocess.
//相当于创建一个socket server
process::initialize();
// TODO(benh): Initialize glog.
}
MesosExecutorDriver::~MesosExecutorDriver()
{
// Just as in SchedulerProcess, we might wait here indefinitely if
// MesosExecutorDriver::stop has not been invoked.
wait(process);
delete process;
pthread_mutex_destroy(&mutex);
pthread_cond_destroy(&cond);
}
Status MesosExecutorDriver::start()
{
Lock lock(&mutex);
if (status != DRIVER_NOT_STARTED) {
return status;
}
// Set stream buffering mode to flush on newlines so that we capture logs
// from user processes even when output is redirected to a file.
setvbuf(stdout, 0, _IOLBF, 0);
setvbuf(stderr, 0, _IOLBF, 0);
bool local;
UPID slave;
FrameworkID frameworkId;
ExecutorID executorId;
std::string workDirectory;
char* value;
std::istringstream iss;
/* Check if this is local (for example, for testing). */
value = getenv("MESOS_LOCAL");
if (value != NULL) {
local = true;
} else {
local = false;
}
/* Get slave PID from environment. */
value = getenv("MESOS_SLAVE_PID");
if (value == NULL) {
fatal("expecting MESOS_SLAVE_PID in environment");
}
slave = UPID(value);
if (!slave) {
fatal("cannot parse MESOS_SLAVE_PID");
}
/* Get framework ID from environment. */
value = getenv("MESOS_FRAMEWORK_ID");
if (value == NULL) {
fatal("expecting MESOS_FRAMEWORK_ID in environment");
}
frameworkId.set_value(value);
/* Get executor ID from environment. */
value = getenv("MESOS_EXECUTOR_ID");
if (value == NULL) {
fatal("expecting MESOS_EXECUTOR_ID in environment");
}
executorId.set_value(value);
/* Get working directory from environment */
value = getenv("MESOS_DIRECTORY");
if (value == NULL) {
fatal("expecting MESOS_DIRECTORY in environment");
}
workDirectory = value;
CHECK(process == NULL);
process =
new ExecutorProcess(slave, this, executor, frameworkId,
executorId, local, workDirectory);
spawn(process);
return status = DRIVER_RUNNING;
}
Status MesosExecutorDriver::stop()
{
Lock lock(&mutex);
if (status != DRIVER_RUNNING && status != DRIVER_ABORTED) {
return status;
}
CHECK(process != NULL);
terminate(process);
// TODO(benh): Set the condition variable in ExecutorProcess just as
// we do with the MesosSchedulerDriver and SchedulerProcess:
// dispatch(process, &ExecutorProcess::stop);
pthread_cond_signal(&cond);
bool aborted = status == DRIVER_ABORTED;
status = DRIVER_STOPPED;
return aborted ? DRIVER_ABORTED : status;
}
Status MesosExecutorDriver::abort()
{
Lock lock(&mutex);
if (status != DRIVER_RUNNING) {
return status;
}
CHECK(process != NULL);
// TODO(benh): Set the condition variable in ExecutorProcess just as
// we do with the MesosSchedulerDriver and SchedulerProcess.
dispatch(process, &ExecutorProcess::abort);
pthread_cond_signal(&cond);
return status = DRIVER_ABORTED;
}
Status MesosExecutorDriver::join()
{
Lock lock(&mutex);
if (status != DRIVER_RUNNING) {
return status;
}
while (status == DRIVER_RUNNING) {
pthread_cond_wait(&cond, &mutex);
}
CHECK(status == DRIVER_ABORTED || status == DRIVER_STOPPED);
return status;
}
Status MesosExecutorDriver::run()
{
Status status = start();
return status != DRIVER_RUNNING ? status : join();
}
Status MesosExecutorDriver::sendStatusUpdate(const TaskStatus& taskStatus)
{
Lock lock(&mutex);
if (status != DRIVER_RUNNING) {
return status;
}
CHECK(process != NULL);
dispatch(process, &ExecutorProcess::sendStatusUpdate, taskStatus);
return status;
}
Status MesosExecutorDriver::sendFrameworkMessage(const string& data)
{
Lock lock(&mutex);
if (status != DRIVER_RUNNING) {
return status;
}
CHECK(process != NULL);
dispatch(process, &ExecutorProcess::sendFrameworkMessage, data);
return status;
}
然后主要关注Status MesosExecutorDriver::join()这个函数,发现如下一段代码:
while (status == DRIVER_RUNNING) {
pthread_cond_wait(&cond, &mutex);
}
这段代码与自己实现的多线程编程真是出奇的相似,在这段函数的刚开始时生命了一个Lock
类型的变量 lock
,再看看Lock
类型是怎么实现的吧,Lock
声明和实现如下:
class Lock
{
public:
Lock(pthread_mutex_t* _mutex);
~Lock();
void lock();
void unlock();
private:
pthread_mutex_t* mutex;
bool locked;
};
Lock::Lock(pthread_mutex_t* _mutex)
: mutex(_mutex), locked(false)
{
lock();
}
void Lock::lock()
{
if (!locked) {
pthread_mutex_lock(mutex);
locked = true;
}
}
void Lock::unlock()
{
if (locked) {
pthread_mutex_unlock(mutex);
locked = false;
}
}
Lock::~Lock()
{
unlock();
}
发现其实Lock
中的构造函数和析构函数只不过是加锁和解锁的过程,将Lock
的代码合并到Status MesosExecutorDriver::join()的函数中,代码(只是伪代码,说明大体结构)如下:
Status MesosExecutorDriver::join()
{
pthread_mutex_lock(mutex);
if (status != DRIVER_RUNNING) {
return status;
}
while (status == DRIVER_RUNNING) {
pthread_cond_wait(&cond, &mutex);
}
CHECK(status == DRIVER_ABORTED || status == DRIVER_STOPPED);
pthread_mutex_unlock(mutex);
return status;
}
这样就应该熟悉了,其实就是一个典型的生产者与消费者的模型。但是,mesos开发者通过将加锁和解锁的过程封装到Lock
的构造函数和析构函数中,在函数的开始时声明变量调用构造函数实现加锁;在函数结束时,析构变量从而自动解锁,这样的使用方法果然很巧妙。
在mesos的源码中还有很多的巧妙之处,有机会再与大家一起分享。