- 在使用MapReduce框架编写程序时,对于MapReduce的key-value,输入输出数据,只能使用Hadoop提供的数据类型,不能使用Java的基本数据类型,例如long-LongWritable,int-IntWritable,String-Text等。
- 在节点间的内部通讯使用的是RPC,RPC协议把消息翻译成二进制字节流发送到远程节点,
远程节点再通过反序列化把二进制流转成原始的信息。 - 想要自定义MapReduce程序中key-value的数据类型,则需要实现相应的接口,如Writable、WritableComparable接口。
Map部分代码
map 阶段 每一行数据都进行切分,切分之后输出数据
* 四个参数:
* KEYIN 输入数据的key 行偏移量(行的起始位置)
* VALUEIN 输入的value 每一行数据的类型
* KEYOUT 输出的 key类型
* VALUEOUT 输出的value类型
public class MapTask extends Mapper<LongWritable, Text, Text, IntWritable>{
@Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
//value 代表每行的数据
//将每行数据进行分割,分割出的单词存入数组
String[] split = value.toString().split(" ");
for (String word : split) {
//将单词作为key,将1作为value,输出格式为:hello 1
context.write(new Text(word), new IntWritable(1));
}
}
}
Reduce部分代码
作用:将key值相同的数据放在一起,去重,统计词频数
* KEYIN Map输出的 key类型,即输入Reduce的key类型
* VALUEIN Map输出的value类型,即输入Reduce的value类型
* KEYOUT Reduce输出的key类型
* VALUEOUT Reduce输出的value类型
public class ReduceTask extends Reducer<Text, IntWritable, Text, IntWritable> {
@Override
protected void reduce(Text key, Iterable<IntWritable> values,
Context context) throws IOException, InterruptedException {
int count = 0;
//去重,并计算同一个key值有几条数据,写入value中
for (IntWritable value : values) {
count = count+value.get();
//count++;
}
//输出数据
context.write(key, new IntWritable(count));
}
}
Driver部分代码
1.hdfs集群上运行
public class Driver {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
//设置提交到哪 yarn local
conf.set("fs.defaultFS", "hdfs://hadoop01:9000");
Job job = Job.getInstance(conf);
//设置job的map和reduce是哪一个,并且设置是哪一任务做提交
job.setMapperClass(MapTask.class);
job.setReducerClass(ReduceTask.class);
job.setJarByClass(Driver.class);
//设置输出类型
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
//设置输入和输出目录
FileInputFormat.addInputPath(job, new Path("/hello.txt"));
FileOutputFormat.setOutputPath(job, new Path("/wordcount/wc-output"));
//判断输出目录是否存在
FileSystem fs = FileSystem.get(conf);
if(fs.exists(new Path("/wordcount/wclipse-out"))) {
fs.delete(new Path("/wordcount/wclipse-out"),true);
}
//提交之后会监控运行状态
boolean completion = job.waitForCompletion(true);
System.out.println(completion?"程序执行完毕":"程序出bug了");
}
}
将项目打成jar包,上传到虚拟机中的hdfs集群上,执行hadoop jar 要执行类的全类名
2.在eclipse本地执行jar包
public class Driver {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
System.setProperty("HADOOP_USER_NAME", "root");//声明使用哪个用户提交
conf.set("fs.defaultFS", "hdfs://hadoop01:9000");//提交到哪里 yarn local
conf.set("mapreduce.framework.name", "yarn");//使用yarn计算
conf.set("yarn.resourcemanager.hostname", "hadoop01");主机名为hadoop01
conf.set("mapreduce.app-submission.sross-platform", "true");
Job job = Job.getInstance(conf,"eclipseToCluster");
job.setMapperClass(MapTask.class);
job.setReducerClass(ReduceTask.class);
//job.setJarByClass(Driver.class);
//将项目打成jar包,将jar包存放位置输入job.setTar()中
job.setJar("C:\\Users\\dell\\Desktop\\wc.jar");
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path("/hello.txt"));
FileOutputFormat.setOutputPath(job, new Path("/wordcount/wclipse-out"));
//判断输出目录是否存在
FileSystem fs = FileSystem.get(conf);
if(fs.exists(new Path("/wordcount/wclipse-out"))) {
fs.delete(new Path("/wordcount/wclipse-out"),true);
}
boolean completion = job.waitForCompletion(true);
System.out.println(completion?0:1);
}
}
要将项目打为jar包,写入指定位置,run as执行的是jar包
3.在eclipse本地执行
public class Driver {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
//System.setProperty("HADOOP_USER_NAME", "root");//声明使用哪个用户提交
/*conf.set("fs.defaultFS", "hdfs://hadoop01:9000");
conf.set("mapreduce.framework.name", "yarn");
conf.set("yarn.resourcemanager.hostname", "hadoop01");
conf.set("mapreduce.app-submission.sross-platform", "true");*/
Job job = Job.getInstance(conf,"eclipseToCluster");
job.setMapperClass(MapTask.class);
job.setReducerClass(ReduceTask.class);
job.setJarByClass(Driver.class);
//job.setJar("C:\\Users\\dell\\Desktop\\wc.jar");
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path("D:\\a\\hello.txt"));
FileOutputFormat.setOutputPath(job, new Path("D:\\a\\wordcount\\wclipse-out"));
//判断文件是否存在
File file = new File("D:\\a\\wordcount\\wclipse-out");
if(file.exists()){
FileUtils.deleteDirectory(file);
}
boolean completion = job.waitForCompletion(true);
System.out.println(completion?"程序执行完毕":"程序出bug了");
}
直接在eclipse本地执行,在本机查看产生的文件是否执行成功