保研成功,TAT,又要开始阅读Giraph源码,这次可不能混了,毕竟是要出两篇A的人TAT
首先说明一下我理解的整个GraphTaskManager的功能
GraphTaskManager的功能,在类图中可以看见,它集合引用了各种其它类serviceworker,servicemaster,zkmanager,它所处理的包括最开始的setup,即对一个worker的初始化建立,还有对该worker要进行BSPservice所做的初始化准备工作,以及通过一个while循环来不停的进行每个superstep的一次又一次的迭代??
于此同时,处理该superstep的活跃的分区,以及cleanup都要该类来实现
通过最开始的介绍可以比较容易看出,这是对一个Node来说,整个BSP过程工作的处理,包含多个超步,同时一个Node可能包含很多vertice,一个node也有很多分区 !!!
GraphTaskManager主要包含下面几部分
主要引用的类有
serviceWorker
serviceMaster
zkManager
主要的功能函数有
setup() 在每个要compute的node上面GraphTaskManager都要调用该函数来初始化建立
在该函数里是对zookeeper,context等一系列的设置,这个暂时还不太熟悉,影响也不大
还有cleanup()该函数,就是cleanup的一些功能
还有processGraphPartitions()这个函数 处理所有该超步活跃的分区
execute()的函数功能比较复杂,大概的工作流程是这样的
1.每隔固定的周期都要检查一下是否结束
2.对于mapper的每个vertex都要运行compute()
3.直到所有的信息都传送完毕
4.检查是否所有的node都已经做完,如果没有跳到第2步
5.结束
/**
* Perform the work assigned to this compute node for this job run.
* 1) Run checkpoint per frequency policy.
* 2) For every vertex on this mapper, run the compute() function
* 3) Wait until all messaging is done.
* 4) Check if all vertices are done. If not goto 2).
* 5) Dump output.
*/
public void execute() throws IOException, InterruptedException {
if (checkTaskState()) {
return;
} //如果节点的状态没有改变,那么就返回true,就直接退出了
finishedSuperstepStats = serviceWorker.setup();
if (collectInputSuperstepStats(finishedSuperstepStats)) {
return;
} //收集当前超步结束后图的状态,当图中没有vertice或者当该节点应该终止了返回true
WorkerAggregatorUsage aggregatorUsage =
prepareAggregatorsAndGraphState();
//做好aggregator和graphstate的相应的收集
List<PartitionStats> partitionStatsList = new ArrayList<PartitionStats>();
int numComputeThreads = conf.getNumComputeThreads();
// main superstep processing loop
do {
final long superstep = serviceWorker.getSuperstep();
GiraphTimerContext superstepTimerContext =
getTimerForThisSuperstep(superstep);
//根据上一轮superstep中得到的finishedSuperstepStats来初始化我们的graphState
GraphState<I, V, E, M> graphState =
new GraphState<I, V, E, M>(superstep,
finishedSuperstepStats.getVertexCount(),
finishedSuperstepStats.getEdgeCount(),
context, this, null, aggregatorUsage);
得到一个安排的分区,并且开始执行??
Collection<? extends PartitionOwner> masterAssignedPartitionOwners =
serviceWorker.startSuperstep(graphState);
if (LOG.isDebugEnabled()) {
LOG.debug("execute: " + MemoryUtils.getRuntimeMemoryStats());
}
context.progress();
//根据master安排的分区重新设置worker的分区
serviceWorker.exchangeVertexPartitions(masterAssignedPartitionOwners);
context.progress();
//检查是否superstep失败过一次,是重启的,重新更新状态,如果不是就没有什么操作
graphState = checkSuperstepRestarted(
aggregatorUsage, superstep, graphState);
//为Superstep做对应的准备
prepareForSuperstep(graphState);
context.progress();
//为当前Superstep对应的该节点做对应的消息的分区
MessageStoreByPartition<I, M> messageStore =
serviceWorker.getServerData().getCurrentMessageStore();
int numPartitions = serviceWorker.getPartitionStore().getNumPartitions();
//线程数是要计算的线程和分区的两个的最小值
int numThreads = Math.min(numComputeThreads, numPartitions);
if (LOG.isInfoEnabled()) {
LOG.info("execute: " + numPartitions + " partitions to process with " +
numThreads + " compute thread(s), originally " +
numComputeThreads + " thread(s) on superstep " + superstep);
}
//情况分区状态列表
partitionStatsList.clear();
//如果有要执行的线程则执行
// execute the current superstep
if (numPartitions > 0) {
processGraphPartitions(context, partitionStatsList, graphState,
messageStore, numPartitions, numThreads);
}
//更新当前Superstep结束后的状态,finishedSuperstepStats
finishedSuperstepStats = completeSuperstepAndCollectStats(
partitionStatsList, superstepTimerContext, graphState);
// END of superstep compute loop
} while (!finishedSuperstepStats.allVerticesHalted());
//只要不是所有的节点都Halt,那么就会循环下去,一次又一次
if (LOG.isInfoEnabled()) {
LOG.info("execute: BSP application done (global vertices marked done)");
}
//很奇怪的更新,不是很懂,是更新当前worker的状态的??
updateSuperstepGraphState(aggregatorUsage);
//是对postallication的callback的处理??依然不是太懂
postApplication();
}
instantiateBspService()在每个要compute的node上面初始化合适的BSP Service
根据是Master和Worker进行不同的初始化操作, 于此同时,对于Master生成对应的线程并且开始工作
/**
* Instantiate the appropriate BspService object (Master or Worker)
* for this compute node.
* @param serverPortList host:port list for connecting to ZK quorum
* @param sessionMsecTimeout configurable session timeout
*/
private void instantiateBspService(String serverPortList,
int sessionMsecTimeout) throws IOException, InterruptedException {
if (graphFunctions.isMaster()) {
if (LOG.isInfoEnabled()) {
LOG.info("setup: Starting up BspServiceMaster " +
"(master thread)...");
}
serviceMaster = new BspServiceMaster<I, V, E, M>(
serverPortList, sessionMsecTimeout, context, this);
masterThread = new MasterThread<I, V, E, M>(serviceMaster, context);
masterThread.start();
}
if (graphFunctions.isWorker()) {
if (LOG.isInfoEnabled()) {
LOG.info("setup: Starting up BspServiceWorker...");
}
serviceWorker = new BspServiceWorker<I, V, E, M>(
serverPortList,
sessionMsecTimeout,
context,
this);
if (LOG.isInfoEnabled()) {
LOG.info("setup: Registering health of this worker...");
}
}
}