hadoop初学者有很多疑问,一个MR作业,从初始任务提交,到作业开始初始化,到作业运行,究竟是从何处开始的?整个作业是如何执行的?现有的书籍,大部分是分模块化进行源代码分析,这样会使初学者陷入一点迷茫,找不到思路,希望此处的文章,对您有所帮助,作者水平有限,希望大神能够批评指正!
1、提交mapreduce作业,执行命令: hadoop jar test.jar
2、查看$HADOOP_HOME目录下bin/hadoop脚本内容,找出加载编写的MR程序的主类:
elif [ "$COMMAND" = "jar" ] ; then
CLASS=org.apache.hadoop.util.RunJar
3、这个主类的主要方法如下,代码附有注释,为了保持主线内容,一些方法具体未做说明,可自行查看:
//hadoop-commons工程下 org.apache.hadoop.util类
public static void main(String[] args) throws Throwable {
new RunJar().run(args);
}
public void run(String[] args) throws Throwable {
String usage = "RunJar jarFile [mainClass] args...";
if (args.length < 1) {
System.err.println(usage);
System.exit(-1);
}
int firstArg = 0;
//获取jar文件名称
String fileName = args[firstArg++];
//获取文件
File file = new File(fileName);
if (!file.exists() || !file.isFile()) {
System.err.println("Not a valid JAR: " + file.getCanonicalPath());
System.exit(-1);
}
String mainClassName = null;
JarFile jarFile;
try {
jarFile = new JarFile(fileName);
} catch(IOException io) {
throw new IOException("Error opening job jar: " + fileName)
.initCause(io);
}
//获取mainClass
Manifest manifest = jarFile.getManifest();
if (manifest != null) {
mainClassName = manifest.getMainAttributes().getValue("Main-Class");
}
jarFile.close();
if (mainClassName == null) {
if (args.length < 2) {
System.err.println(usage);
System.exit(-1);
}
mainClassName = args[firstArg++];
}
mainClassName = mainClassName.replaceAll("/", ".");
//获取java.io.tmpdir,不同的操作系统此处文件夹也不相同,可以自己写个main方法,在不同的操作系统运行一下
File tmpDir = new File(System.getProperty("java.io.tmpdir"));
//Runjar类内部方法,只是做个校验
ensureDirectory(tmpDir);
final File workDir;
try {
//创建临时工作目录,为了保持主线,这个方法内部的实现,先不在这里说
workDir = File.createTempFile("hadoop-unjar", "", tmpDir);
} catch (IOException ioe) {
// If user has insufficient perms to write to tmpDir, default
// "Permission denied" message doesn't specify a filename.
System.err.println("Error creating temp dir in java.io.tmpdir "
+ tmpDir + " due to " + ioe.getMessage());
System.exit(-1);
return;
}
if (!workDir.delete()) {
System.err.println("Delete failed for " + workDir);
System.exit(-1);
}
ensureDirectory(workDir);
//启动一个删除临时文件的hook
ShutdownHookManager.get().addShutdownHook(
new Runnable() {
@Override
public void run() {
FileUtil.fullyDelete(workDir);
}
}, SHUTDOWN_HOOK_PRIORITY);
//解压jar文件
unJar(file, workDir);
//classLoader加载,得到main方法
ClassLoader loader = createClassLoader(file, workDir);
Thread.currentThread().setContextClassLoader(loader);
Class<?> mainClass = Class.forName(mainClassName, true, loader);
Method main = mainClass.getMethod("main", new Class[] {
Array.newInstance(String.class, 0).getClass()
});
String[] newArgs = Arrays.asList(args)
.subList(firstArg, args.length).toArray(new String[0]);
try {
main.invoke(null, new Object[] { newArgs });
} catch (InvocationTargetException e) {
throw e.getTargetException();
}
}
4、以上步骤,大致可以知道hadoop是怎样开始加载开发者编写的代码。
欢迎加入hadoop技术交流群:481116275