如果本文对您有所帮助,可以点一下赞👍
本文只是学习笔记,欢迎指错,转载标明出处
实例1
假设一个年级有两个班级,数据分别在class1.csv和class2.csv中,求该年级的数学成绩平均值。数据第一列为学号,第二列为数学成绩。
要求,必须使用Combiner类,且最后输出一行数据,该行仅有一个平均值。
class1.csv、class2.csv部分截图
结果截图
工程目录
Exper2Mapper.java
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class Exper2Mapper extends Mapper<LongWritable, Text, NullWritable, IntWritable> {
private IntWritable result = new IntWritable();
@Override
protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, NullWritable, IntWritable>.Context context)
throws IOException, InterruptedException {
String str[]=value.toString().split(",");
result.set(Integer.parseInt(str[1]));
context.write(NullWritable.get(),result );
}
}
Exper2Reducer.java
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Reducer.Context;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class Exper2Reducer extends Reducer<NullWritable, IntWritable,NullWritable, IntWritable>{
private IntWritable result = new IntWritable();
//protected void reduce(NullWriter key, Iterable<IntWritable> value,
//Reducer<NullWritable, IntWritable, NullWritable, IntWritable>.Context context) throws IOException, InterruptedException {
//@Override
//protected void reduce(NullWriter key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException {
@Override
protected void reduce(NullWritable key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException {
Integer sum=0;
Integer avg=0;
Integer count=0;
for(IntWritable item : values) {
count+=1;
sum+=item.get();
}
avg=sum/count;
result.set(avg);
context.write(NullWritable.get(), result);
}
}
ExperA2Driver.java
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class ExperA2Driver {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf=new Configuration();
Job job=Job.getInstance(conf);
job.setJarByClass(ExperA2Driver.class);
job.setMapperClass(Exper2Mapper.class);
//Combiner使用Reducer的类
job.setCombinerClass(Exper2Reducer.class);
job.setReducerClass(Exper2Reducer.class);
job.setOutputKeyClass(NullWritable.class);
job.setOutputValueClass(IntWritable.class);
job.setNumReduceTasks(1);
FileInputFormat.addInputPath(job, new Path("hdfs://localhost:9000/user/hadoop/data"));
FileOutputFormat.setOutputPath(job, new Path("hdfs://localhost:9000/output/exper2"));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
实例2
假设有一个服务器每天都记录同一个网站的访问量数据,主要是该网站下所有页面中的最大访问量和最小访问量,数据存储在下面三个文件中。
数据格式如下(记录时不具体到天):
说明:第一列为某年某月的时间信息,第二列为该月内某天观测到的最大访问量,第三列为该月内同一天观测到的最小访问量。
程序设计要求如下:
要求①:最后输出网站每个月内的最大值、最小值,一个月一行数据。
如图中2017-07最大值为900,最小值为100;2017-08最大值为560,最小值为200
输出格式如下
2017-08 560 200
2017-07 900 100
要求②必须自定义一个数据类型,包含某天观测到的最大最小访问量。自定义类型
要求③要求自定义分区函数,2017年的数据全部规约到一个reducer处理,2018年的数据全部规约到另一个reducer处理。分区
要求④要求同一年的数据按月份降序排序。二次排序
如:
2017-08 560 200
2017-07 900 100
结果截图
工程目录
Exper3Container.java
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.io.WritableComparable;
public class Exper3Container implements Writable {
private int max;
private int min;
public int getMax() {
return max;
}
public void setMax(int max) {
this.max = max;
}
public int getMin() {
return min;
}
public void setMin(int min) {
this.min = min;
}
@Override
public void readFields(DataInput in) throws IOException{
max = in.readInt();
min = in.readInt();
}
@Override
public void write(DataOutput out) throws IOException{
out.writeInt(max);
out.writeInt(min);
}
@Override
public String toString() {
// TODO Auto-generated method stub
return max+" "+min;
}
}
Exper3Driver.java
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class Exper3Driver {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf=new Configuration();
Job job=Job.getInstance(conf);
job.setJarByClass(Exper3Driver.class);
job.setMapperClass(Exper3Mapper.class);
//job.setCombinerClass(Exper3Reducer.class);
job.setNumReduceTasks(2);
job.setSortComparatorClass(Exper3Sort.class);
job.setReducerClass(Exper3Reducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Exper3Container.class);
job.setPartitionerClass(Exper3Partitioner.class);
FileInputFormat.addInputPath(job, new Path("hdfs://localhost:9000/user/hadoop/data/Exper3"));
FileOutputFormat.setOutputPath(job, new Path("hdfs://localhost:9000/output/exper3"));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Exper3Mapper.java
import java.io.IOException;
public class Exper3Mapper extends Mapper<LongWritable,Text, Text, Exper3Container>{
private Exper3Container container=new Exper3Container();
@Override
protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, Exper3Container>.Context context)
throws IOException, InterruptedException {
String str[]=value.toString().split(" ");
container.setMax(Integer.parseInt(str[1]));
container.setMin(Integer.parseInt(str[2]));
Text text=new Text();
text.set(str[0]);
context.write(text, container);
}
}
Exper3Partitioner.java
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Partitioner;
public class Exper3Partitioner extends Partitioner<Text, Exper3Container>{
@Override
public int getPartition(Text key, Exper3Container value, int numReduceTasks) {
String[] str=key.toString().split("-");
int num=Integer.parseInt(str[0]);
return (num& Integer.MAX_VALUE) % numReduceTasks;
}
}
Exper3Reducer.java
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Reducer.Context;
public class Exper3Reducer extends Reducer<Text, Exper3Container, Text ,Exper3Container> {
private Exper3Container container=new Exper3Container();
@Override
protected void reduce(Text key, Iterable<Exper3Container> values,Context context)
throws IOException, InterruptedException {
int max=0;
int min=Integer.MAX_VALUE;
for (Exper3Container item : values) {
if(item.getMax()>max) max=item.getMax();
if(item.getMin()<min) min=item.getMin();
}
container.setMax(max);
container.setMin(min);
context.write(key, container);
}
}
Exper3Sort.java
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.io.WritableComparator;
public class Exper3Sort extends WritableComparator{
public Exper3Sort() {
super(Text.class,true);
}
@Override
public int compare(WritableComparable a, WritableComparable b) {
Text t1 =(Text)a;
Text t2 =(Text)b;
String[] str1 =t1.toString().split("-");
int num1=Integer.parseInt(str1[1]);
String[] str2 =t2.toString().split("-");
int num2=Integer.parseInt(str2[1]);
if(num1>num2) return -1;
else if(num2>num1) return 1;
else return 0;
}
}