2021SC@SDUSC
概述
本次继续分析pig作为hadoop的轻量级脚本语言操作hadoop的executionengine包下的MapReduceLauncher类的代码
notifyProgress方法
记录进程,如果进程足够,则通知侦听器
private boolean notifyProgress(double prog, double lastProg) {
if (prog >= (lastProg + 0.04)) {
int perCom = (int)(prog * 100);
if(perCom!=100) {
log.info( perCom + "% complete");
MRScriptState.get().emitProgressUpdatedNotification(perCom);
}
return true;
}
return false;
}
compile方法
public MROperPlan compile(
PhysicalPlan php,
PigContext pc) throws PlanException, IOException, VisitorException {
}
从MapReduce编译器发送警告信息
comp.getMessageCollector().logMessages(MessageType.Warning, aggregateWarning, log);
从组合优化器发送警告信息
if (!pc.inIllustrator && !("true".equals(prop))) {
boolean doMapAgg =
Boolean.valueOf(pc.getProperties().getProperty(PigConfiguration.PIG_EXEC_MAP_PARTAGG,"false"));
CombinerOptimizer co = new CombinerOptimizer(plan, doMapAgg);
co.visit();
co.getMessageCollector().logMessages(MessageType.Warning, aggregateWarning, log);
}
相同的作业通过优化仅仅只需要加载/储存第一个MR作业
SampleOptimizer so = new SampleOptimizer(plan, pc);
so.visit();
确保只有一个Reducer器用于限制。添加独立的Reducer作业
if (!pc.inIllustrator) {
LimitAdjuster la = new LimitAdjuster(plan, pc);
la.visit();
la.adjust();
}
如果可以的话优化以使用辅助排序密钥
prop = pc.getProperties().getProperty(PigConfiguration.PIG_EXEC_NO_SECONDARY_KEY);
if (!pc.inIllustrator && !("true".equals(prop))) {
SecondaryKeyOptimizerMR skOptimizer = new SecondaryKeyOptimizerMR(plan);
skOptimizer.visit();
}
确定地图计划的密钥类型,当键为 null 以创建时,这是必需的,适当的可填为XXX空内容可写对象
KeyTypeDiscoveryVisitor kdv = new KeyTypeDiscoveryVisitor(plan);
kdv.visit();
删除由拆分引入的 filter 运算符
NoopFilterRemover fRem = new NoopFilterRemover(plan);
fRem.visit();
在多查询和NoopFilterRemover删除不必要的存储
NoopStoreRemover sRem = new NoopStoreRemover(plan);
sRem.visit();
检查多查询优化器后是否存在流运算符,因为它可以将流从map转移到reduce等
EndOfAllInputSetter checker = new EndOfAllInputSetter(plan);
checker.visit();