datax中将job根据channel切分成task,task又构成taskGroup,真正到执行阶段,是由各类runner执行,现在就看看看各种runner;
一、概述AbstractRunner
AbstractRunner 是各类runner的基类,由他衍生出ReaderRUnner和WriterRunner,父类AbstractRunner中定义并实现了几个基本方法;
族谱
主要方法及属性,可以从下图看到,
主要属性
/**
* 基类任务插件
*/
private AbstractTaskPlugin plugin;
/**
* 任务的配置信息
*/
private Configuration jobConf;
/**
* runner的通讯类,里面可以记录该runner的信息
*/
private Communication runnerCommunication;
private int taskGroupId;
private int taskId;
主要6个方法:destroy、markFail、markRun、markSuccess、shutdown和mark,其中shutdown没有实现,需要子类实现;
二、子类ReaderRunner
ReaderRunner 是真正将 reader 跑起来的类,还是老规矩,先看他的族谱
可以看到他家三代关系,AbstractRunner、Runner。同时看到他的大部分方法都是AbstractRunner实现,本事只做2件事:run()和shutdown(),前者是重写java中Runner的run,后者则是重写AbstractRunner的shutdown;
run()方法主要做2件事
/**
* 具体执行的方法 <br>
* 1 reader.task每个会执行4个阶段,分别是 init()、prepare()、startRead(recordSender)、post(); <br>
* 2 reader.task的每个执行阶段,会收集该阶段的 信息保存到PerfRecord中,PerfRecord.start 方法会将信息汇总到PerfTrace <br/>
*/
@Override
public void run() {
assert null != this.recordSender;
//将当前插件类型强转为Reader下的Task
Reader.Task taskReader = (Reader.Task) this.getPlugin();
int taskGroupId = getTaskGroupId();
int taskId = getTaskId();
//统计waitWriterTime,并且在finally才end。
PerfRecord channelWaitWrite = new PerfRecord(taskGroupId, taskId, WAIT_WRITE_TIME);
try {
channelWaitWrite.start();
LOG.debug("task reader starts to do init ...");
PerfRecord initPerfRecord = new PerfRecord(taskGroupId, taskId, READ_TASK_INIT);
initPerfRecord.start();
taskReader.init();
initPerfRecord.end();
LOG.debug("task reader starts to do prepare ...");
PerfRecord preparePerfRecord = new PerfRecord(taskGroupId, taskId, READ_TASK_PREPARE);
preparePerfRecord.start();
taskReader.prepare();
preparePerfRecord.end();
LOG.debug("task reader starts to read ...");
PerfRecord dataPerfRecord = new PerfRecord(taskGroupId, taskId, READ_TASK_DATA);
dataPerfRecord.start();
taskReader.startRead(recordSender);
recordSender.terminate();
long count = CommunicationTool.getTotalReadRecords(super.getRunnerCommunication());
dataPerfRecord.addCount(count);
dataPerfRecord.addSize(CommunicationTool.getTotalReadBytes(super.getRunnerCommunication()));
dataPerfRecord.end();
LOG.debug("task reader starts to do post ...");
PerfRecord postPerfRecord = new PerfRecord(taskGroupId, taskId, READ_TASK_POST);
postPerfRecord.start();
taskReader.post();
postPerfRecord.end();
// automatic flush
// super.markSuccess(); 这里不能标记为成功,成功的标志由 writerRunner 来标志(否则可能导致 reader 先结束,
// 而 writer 还没有结束的严重 bug)
} catch (Throwable e) {
LOG.error("Reader runner Received Exceptions:", e);
super.markFail(e);
} finally {
LOG.debug("task reader starts to do destroy ...");
PerfRecord desPerfRecord = new PerfRecord(taskGroupId, taskId, READ_TASK_DESTROY);
desPerfRecord.start();
super.destroy();
desPerfRecord.end();
long elapsedTimeInNs = super.getRunnerCommunication().getLongCounter(WAIT_WRITER_TIME);
channelWaitWrite.end(elapsedTimeInNs);
long transformUsedTime = super.getRunnerCommunication().getLongCounter(TRANSFORMER_USED_TIME);
if (transformUsedTime > 0) {
PerfRecord transformerRecord = new PerfRecord(taskGroupId, taskId, TRANSFORMER_TIME);
transformerRecord.start();
transformerRecord.end(transformUsedTime);
}
}
}
run方法的运行时序图
shutdown 比较简单
@Override
public void shutdown() {
recordSender.shutdown();
}
三、子类WriterRunner
WriterRunner和ReaderRunner大同小异,不再赘述;
注:
-
对源码进行略微改动,主要修改为 1 阿里代码规约扫描出来的,2 clean code;
-
所有代码都已经上传到github(master分支和dev),可以免费白嫖