Hadoop MapReduce 客户端提交Job的过程
MapReduce WordCount的示例代码
package MapReduceLearn.MapReduceLearn.Art;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
public class WordCount {
public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {
private Text word = new Text();
private final static IntWritable one = new IntWritable(1);
@Override
protected void map(Object key, Text value, Mapper<Object, Text, Text, IntWritable>.Context context)
throws IOException, InterruptedException {
// TODO Auto-generated method stub
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable result = new IntWritable(0);
@Override
protected void reduce(Text arg0, Iterable<IntWritable> arg1,
Reducer<Text, IntWritable, Text, IntWritable>.Context arg2) throws IOException, InterruptedException {
// TODO Auto-generated method stub
int sum = 0;
for (IntWritable val : arg1) {
sum += val.get();
}
result.set(sum);
arg2.write(arg0, result);
}
}
public static void main(String[] args) {
Configuration conf = new Configuration();
// FileInputFormat.addInputPath(job.getConfiguration(), new Path(args[0]));
try {
Job job = Job.getInstance();
JobConf jobConf = (JobConf)job.getConfiguration();
// Hadoop 2.0为Job设置 Jar class,否则任务执行过程中无法找到Map类和Reduce类
job.setJarByClass(WordCount.class);
job.setJobName("word count");
job.setMapperClass(TokenizerMapper.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(jobConf, new Path("hdfs://bigdatamaster:9000/data/wordcount/words/"));
FileOutputFormat.setOutputPath(jobConf, new Path("hdfs://bigdatamaster:9000/data/wordcount/result/"));
job.waitForCompletion(true);
} catch (ClassNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
Job提交过程
YARNRunner的成员rmClient的获取过程
rmClient的获取过程
之后,调用proxy的任何方法时,proxy类都会把方法和参数带入到自己的InvocationHandler的invoke函数中,RetryInvocationHandler最终会去调用proxy类的同名方法
关于java在Proxy这块儿的反射机制我会在后面的ApplicationClientProtocolPBClientImpl描述部分解释如何运行
DefaultFailoverProxyProvider的Proxy的生成过程
默认情况下,此阶段最终会获得类ApplicationClientProtocolPBClientImpl的实例
ApplicationClientProtocolPBClientImpl的成员proxy的获取过程
从源代码出发,获取proxy的代码如下:
public ApplicationClientProtocolPBClientImpl(long clientVersion,
InetSocketAddress addr, Configuration conf) throws IOException {
RPC.setProtocolEngine(conf, ApplicationClientProtocolPB.class,
ProtobufRpcEngine.class);
proxy = RPC.getProxy(ApplicationClientProtocolPB.class, clientVersion, addr, conf);
}
ApplicationClientProtocolPB协议代理类的生成过程
生成代理类的代码如下:
public <T> ProtocolProxy<T> getProxy(Class<T> protocol, long clientVersion,
InetSocketAddress addr, UserGroupInformation ticket, Configuration conf,
SocketFactory factory, int rpcTimeout, RetryPolicy connectionRetryPolicy,
AtomicBoolean fallbackToSimpleAuth) throws IOException {
final Invoker invoker = new Invoker(protocol, addr, ticket, conf, factory,
rpcTimeout, connectionRetryPolicy, fallbackToSimpleAuth);
return new ProtocolProxy<T>(protocol, (T) Proxy.newProxyInstance(
protocol.getClassLoader(), new Class[]{protocol}, invoker), false);
}
我们主要看Proxy.newProxyInstance的过程
getProxyClass0
函数原型:private static Class<?> getProxyClass0(ClassLoader loader, Class<?>… interfaces)
函数作用:根据loader和interfaces创造一个代理类,这个类是动态生成的
创造新类的过程:
- proxyClassCache的valuesMap内存结构
Key | Value |
---|---|
Proxy.Key1 (KeyFactory.apply) | Factory (get=>ProxyClassFactory.apply) |
Proxy.Key2 (KeyFactory.apply) | Factory (get=>ProxyClassFactory.apply) |
Proxy.key0 (KeyFactory.apply) | Factory (get=>ProxyClassFactory.apply) |
Proxy.KeyX (KeyFactory.apply) | Factory (get=>ProxyClassFactory.apply) |
- ProxyClassFactory.apply
- Proxy动态类的方法生成
这一部分,是生成的供jvm直接使用的方法
大致生成的方法如下:
行号 | 函数有参数时的指令 | 函数无参数时的指令 | 指令含义 |
---|---|---|---|
1 | aload_0 | aload_0 | |
2 | getfiled | getfiled | 获取指定类的实例域,并将其值压入到栈中 |
3 | ref | ref | 执行class Proxy {InvocationHandler h;}的reference |
4 | aload_0 | aload_0 | |
5 | getstatic | getstatic | 获取指定类的静态域,并将其值压入到栈中 |
6 | ref | ref | 本函数对应函数名的reference |
7 | a. 将参数列表长度压入到栈中; b. anewarray; c. 关于java.lang.Object的引用入栈; d. dup; e. 参数对应在参数列表中的序列值入栈; f.参数类型对应的字长入栈; | aconst_null | 参数入栈 |
8 | invokeinterface | invokeinterface | 调用接口方法 |
9 | ref | ref | 对应java.lang.reflect.InvocationHandler {java.lang.Object invoke(java.lang.Object, java.lang.relfect.Method, java.lang.Object)}的reference |
10 | 4 | 4 | |
11 | 0 | 0 | |
12 | 返回值 | 返回值 |
翻译后的大致结果大概是,以submitApplication函数为例
- 函数原型:
public org.apache.hadoop.yarn.proto.YarnServiceProtos.SubmitApplicationResponseProto submitApplication(
com.google.protobuf.RpcController controller,
org.apache.hadoop.yarn.proto.YarnServiceProtos.SubmitApplicationRequestProto request)
- 生成的代理函数为
class Proxy1 {
java.lang.reflect.InvocationHandler h;
org.apache.hadoop.yarn.proto.YarnServiceProtos.SubmitApplicationResponseProto submitApplication(com.google.protobuf.RpcController controller, org.apache.hadoop.yarn.proto.YarnServiceProtos.SubmitApplicationRequestProto request) {
h.invoke(Proxy, submitApplication, {RpcController, SubmitApplicationRequestProto request})}
}
总结,最终,动态生成的Proxy类,在调用被代理类的方法时,实际上是调用了用户自己提供的InvocationHandler中的invoke函数来调用相应的方法