Mapreduce(Java程序编写)

Mapreduce:分布式计算框架
开发人员要做的事情:实现Map和Reduce函数

在这里插入图片描述
一般只调用HDFS的话,不实际Yarn的工作,调用Mapreduce时才会调用yarn

在这里插入图片描述

三台设备Mapreduce详细过程
在这里插入图片描述

在这里插入图片描述

Mapreduce编程规范
MapReduce的开发一共有八个步骤,其中Map阶段分为2个步骤,huffle阶段分为4个步骤,
Reduce阶段分为2个步骤;
Map阶段2个步骤:
1.设置InputFormat类,将数据切分为Key-Value(K1和V1)(即<每个行号的偏移量,行内容>)对,输入到第2步
2.自定义Map逻辑,将第一步的结果转换成另外的Key-Value<K2,V2>对(<单词,出现次数>),输出结果
Shuffle阶段4个步骤:
3.对输出的Key-Value对进行分区
4.对不同分区的数据按照相同的Key排序
5.(可选)对分组过的数据初步规约,降低数据的网络拷贝
6.对数据进行分组,相同Key的Value放入一个集合中,还是<K2,V2>(<单词,出现次数的集合>)
Reduce阶段2个步骤:
7.对多个Map任务的结果进行排序以及合并,编写Reduce函数实现自己的逻辑,对输入的Key-Value进行处理,转为新的Key-Value(K3和V3)输出
8.设置OutputFormat处理并保存Reduce输出的Key-Value数据
在这里插入图片描述
在这里插入图片描述

在这里插入图片描述
Mapreduce编程1:(只包含Map程序和Reduce程序)
内容:对一个文本作单词计数
更改jdk1.7变成jdk1.8
junit 4.11改成4.12

  1. 将a.txt上传到hdfs上
  2. 新建Maven文件,导入坐标依赖
  3. org.apache.hadoop hadoop-common 2.6.0 org.apache.hadoop hadoop-hdfs 2.6.0 org.apache.hadoop hadoop-auth 2.6.0 org.apache.hadoop hadoop-client 2.6.0 org.apache.hadoop hadoop-mapreduce-client-core 2.6.0 org.apache.hadoop hadoop-mapreduce-client-jobclient 2.6.0

3. 编写Map程序

public class WCMapper extends Mapper<LongWritable, Text, Text, LongWritable>{
//Mapper的四个泛型相当于<Long, String, String, Long>,只不过hadoop认为Java的类型太不方便,所以改写了一下
//重写Map方法,Map方法就是将K1和V1转为K2和V2
/*参数:key:K1 每行偏移量
			vlaue:V1   每一行的文本数据
			context:表示上下文对象,连接Map--shuffle--duce各个步骤的
*/
/*
	如何将K1和V1转化成K2和V2
	K1					V1
	0		hello,world,hadoop
	15		hdfs,hive,hello
	----------------------------------
	K2					V2
	hello					1
	world					1
	hdfs						1
	hadoop				1
	hello					1
*/
protected void map(LongWritable key, Text value, Context context) throws IOException InterruptedException{
Text text=new Text();
LongWritable longWritable=new LongWritable();
// 1.将一行的文本数据进行拆分(value是Text类型的,没有split方法,要转换成String类型)
String[]split=value.toString().split();
// 2.遍历数组,组装K2和V2
for(String word:split){
// 3.将K2和V2写入上下文context中(context的write方法只能写入Text和LongWritable里,需要类型转换)
text.set(word);
longWritable.set(1);
context.write(text,longWritable);`在这里插入代码片`
}
}

Reduce阶段

//四个泛型代表K2和V2,K3和V3类型
public class WCReducer extends Reducer<Text,IntWritable,Text,IntWritable>{
protected void reduce(Text key, Iterable<IntWritable>values,Context context)throws IOException{
long count=0;
//1、遍历集合,将集合中的数字相加,得到V3
for(LongWritable value:values){
count+=value.get();
}
//2.将K3和V3写入上下文Context中
context.write(key,new IntWritable(count));
}
}

主类:

public class WCDriver{
	public static void main(String [] args) throws Exception{
//job建立设置
	Configuration configuraiton=new Configuration();
	Job job=null;
		job=Job.getInstance(conf,"wc");
		job.setJarByClass(WCDriver.class);
//job配置八大步骤:
		//1.输入设置
	    job.setInputFormatClass(TextInputFormat.class);
		TextInpuFormat.addInputPath(job,new Path("a.txt"));
		//2.map设置
		job.setMapperClass(WCDMapper.class);
		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(IntWritable.class);
		//3,4,5,6.shuffle阶段(分区,排序,规约,分组)
		//7.reduce设置
		job.setReducerClass(WCReducer.class);		
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(IntWritable.class);
		//8.输入输出设置
		job.setOutputFormatClass(TextOutputFormat.class);
		TextOutputFormat.setOutputPath(job,new Path("out1"));
//job结束设置
		boolean flag=job.waitForCompletion(true);
		System.out.print(flag?"成功","失败");
}
}

形成jar包,在Linux上运行jar包。hadoop jar +jar包+程序全名称l;
(Hadoop将jar包分发给各个datanode,运行结果汇总,输出后写入文件中)

打包:第一种方式:projectstructure—Artifacts—±–JAR—frommodulesbydependencies—选择程序—OK—build…方式
第二种方式:右侧的Mavenprojects—lifecycle—clean—package—下面会显示jar包的位置

Mapreduce编程2:包括Map、Shuffle、Reduce程序—(shuffle里的分区)

**内容:对才彩票信息分类(第6列数据大于15和小于等于15分类)**要从Map到Reduce都将行当做key
分区:给Map Task的执行结果(每个<k2,v2>)打上标记,有相同标记的值去往同一个Reduce Task里

Map代码:

/*K1 :行偏移量	LongWritable
  V1 :行文本数据	Text

  K2 :行文本数据 Text
  V2 : NullWritable(我们最终只要给彩票信息分类,行数据给K2就行了,不需要V2这个数据,所以当做占位符就行了)
*/
public class PartitinoMapper extendds Mapper<LongWritable,Text,Text,NullWritable>{
	//map方法将K1和V1转为K2和V2
	@Override
	protected void map(LongWritbale key ,Text value, Context)throws IOException,InterruptedException{
	    context.write(value,NullWritable.get());
   }
}

分区代码:

/*定义分区规则
  返回对应的分区编号
*/
public class MyPartitioner extends Partitioner<Text,NullWritable>{
	@Override
	public int getPartition(Text text,NullWritable nullWritable, int i){
	//1.拆分行文本数据(K2),获取中奖字段的值
	String[] spllit = text.toString().split("\t");
	String numStr = split[5];
	//2.判断中奖字段的值和15的关系,然后返回对应的分区编号
	if(Integer.parseInt(numStr)>15){
	return 1;
	}else {
	return 0;
	}
}

Reduce代码:

/*定义分区规则
  返回对应的分区编号
*/
public class PartitionerReducer extends Reducer<Text,NullWritable,Text,NullWritable>{
	@Override
	protected void reduce(Text key, Iterable<NullWritable> values,Context context)throws IOException,InterrupedException{
	context.write(key,NullWritable.get());
	}
}

主代码:

/*
*/
public class JobMain extends Configured implements Tool{
	@Override
	public int run(String[]args) throws Exception{
	//1.创建job任务对象(configuration对象,随便起个名字)
	Job job=Job.getInstance(super.getConf(),"partition_maperduce");
	//2.对job任务进行分配(共八步)
	//第一步:设置输入的类和输入的路径
	job.setInputFormatClass(TextInputFormat.class);
	TextInputFormat.addInputPath(job,new Path("hdfs://hadoop100:8020/intput"));
	//第二部:设置Mapper类和数据类型
	job.setMapperClass(PartitionMapper.class);
	job.setMapOutputKeyClass(Text.class);
	job.setMapOutputValueClass(NullWritable.class);
	//第三步:指定分区类
	job.setPartitionerClass(MyPartitioner.class);
	//第四、五、六步
	//第七步:指定Reducer类和数据类型(K3和V3)
	job.setReducerClass(PartitionerReducer.class);
	job.setOutputKeyClass(Text.class);
	job.setOutputValueClass(NullWritable.class);
	//第八步:指定输出类和输出路径
	job.setOutputFormatClass(TextOutputFormat.class);
	TextOutputFormat.setOutputPath(job,new Path("hdfs://hadoop100:8020/out/partiton_out"));
	//3.等待任务结束
	boolean b1=job.waitForCompletion(true);
	return b1? 0: 1;
	}
	public static void main(String[] args){
	Configuration configuration=new Configuration();
	//启动job任务
	int run = ToolRunner.run(configuration,new JobMain(),args);
	System.exit(run);
}

Mapreduce编程3:包括Map、程序-----shuffle里的排序

**内容:对两个文件进行读取,以对象的方式,完成对象序列化和反序列化,进行对象的传输(对象类需要实现WritableComparable接口,改写compareT(),write(),read()三个接口,分别是比较方法,序列化,反序列化)**要将行的值拆开塞进对象属性里,对象当做key;
对象代码

public class EmpDep implements WritableComparable<EmpDep> {
    private String name;
    private String gender;
    private int age;
    private int deptNo;
    private String deptName;

    @Override
    public int compareTo(EmpDep o) {
        if(null==o){
            return 0;
        }else{
            return this.age-o.age;
        }
    }

    @Override
    public void write(DataOutput out) throws IOException {
        out.writeUTF(name==null?"":name);
        out.writeUTF(gender==null?"":gender);
        out.writeInt(age);
        out.writeInt(deptNo);
        out.writeUTF(deptName==null?"":deptName);
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        name=in.readUTF();
        gender=in.readUTF();
        age=in.readInt();
        deptNo=in.readInt();
        deptName=in.readUTF();
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public String getGender() {
        return gender;
    }

    public void setGender(String gender) {
        this.gender = gender;
    }

    public int getAge() {
        return age;
    }

    public void setAge(int age) {
        this.age = age;
    }

    @Override
    public String toString() {
        return "EmpDep{" +
                "name='" + name + '\'' +
                ", gender='" + gender + '\'' +
                ", age=" + age +
                ", deptNo=" + deptNo +
                ", deptName='" + deptName + '\'' +
                '}';
    }

    public String getDeptName() {
        return deptName;
    }

    public void setDeptName(String deptName) {
        this.deptName = deptName;
    }

    public int getDeptNo() {
        return deptNo;
    }

    public void setDeptNo(int deptNo) {
        this.deptNo = deptNo;
    }
}

Map代码

public class JoinMapper extends Mapper<LongWritable, Text,EmpDep, NullWritable> {
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String s = value.toString();
        //去除空行
        if(null!=s &&!"".equals(s)){
            //使用空字符为分隔符
            String[] columns = s.split("\\s");
            EmpDep ed=new EmpDep();
          //  for (String column : columns) {
                if(columns.length==2){
                    //dep.txt
                    ed.setDeptNo(Integer.valueOf(columns[0]));
                    ed.setDeptName(columns[1]);
                }else{
                    //emp.txt
                    int num=0;
                    String[] actural=new String[4];
                    //担心空格的切分会超过四个元素
                    for (String column : columns) {
                        if(null==column || "".equals(column)){
                            continue;
                        }
                        actural[num]=column;
                        num++;
                    }
                    ed.setName(actural[0]);
                    ed.setGender(actural[1]);
                    ed.setAge(Integer.valueOf(actural[2]));
                    ed.setDeptNo(Integer.valueOf(actural[3]));
                }
            //}
            context.write(ed,NullWritable.get());
        }

    }
}

主代码

public class JoinDriver {
    public static void main(String[] args) throws Exception{
        Configuration conf=new Configuration();
        Job job=Job.getInstance(conf,"ww");
        job.setJarByClass(JoinDriver.class);
        //
        job.setInputFormatClass(TextInputFormat.class);
        TextInputFormat.setInputPaths(job,new Path("lcs/"));

        job.setMapperClass(JoinMapper.class);
        job.setMapOutputKeyClass(EmpDep.class);
        job.setMapOutputValueClass(NullWritable.class);

        job.setOutputKeyClass(EmpDep.class);
        job.setOutputValueClass(NullWritable.class);
        job.setOutputFormatClass(TextOutputFormat.class);
        TextOutputFormat.setOutputPath(job,new Path("out3"));

        boolean b1=job.waitForCompletion(true);
        System.out.println(b1?"成功":"失败");
    }
}

Mapreduce编程4:Reducer阶段的Join:包括Map、Reducer、主程序

**内容:对两个文件进行读取,以对象的方式,完成对象序列化和反序列化,进行对象的传输(对象类需要实现WritableComparable接口,改写compareT(),write(),read()三个接口,分别是比较方法,序列化,反序列化)**要将行的值拆开塞进对象属性里,对象当做key;
对象代码

public class EmpDep implements WritableComparable<EmpDep> {
    private String name;
    private String gender;
    private int age;
    private int deptNo;
    private String deptName;

    @Override
    public int compareTo(EmpDep o) {
        if(null==o){
            return 0;
        }else{
            return this.age-o.age;
        }
    }

    @Override
    public void write(DataOutput out) throws IOException {
        out.writeUTF(name==null?"":name);
        out.writeUTF(gender==null?"":gender);
        out.writeInt(age);
        out.writeInt(deptNo);
        out.writeUTF(deptName==null?"":deptName);
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        name=in.readUTF();
        gender=in.readUTF();
        age=in.readInt();
        deptNo=in.readInt();
        deptName=in.readUTF();
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public String getGender() {
        return gender;
    }

    public void setGender(String gender) {
        this.gender = gender;
    }

    public int getAge() {
        return age;
    }

    public void setAge(int age) {
        this.age = age;
    }

    @Override
    public String toString() {
        return "EmpDep{" +
                "name='" + name + '\'' +
                ", gender='" + gender + '\'' +
                ", age=" + age +
                ", deptNo=" + deptNo +
                ", deptName='" + deptName + '\'' +
                '}';
    }

    public String getDeptName() {
        return deptName;
    }

    public void setDeptName(String deptName) {
        this.deptName = deptName;
    }

    public int getDeptNo() {
        return deptNo;
    }

    public void setDeptNo(int deptNo) {
        this.deptNo = deptNo;
    }
}

Map代码

public class JoinMapper extends Mapper<LongWritable, Text, IntWritable,EmpDep> {
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String s = value.toString();
        //去除空行
        if(null!=s &&!"".equals(s)){
            //使用空字符为分隔符
            String[] columns = s.split(" ");
            EmpDep ed=new EmpDep();
            //  for (String column : columns) {
            if(columns.length==2){
                //dep.txt
                ed.setDeptNo(Integer.valueOf(columns[0]));
                ed.setDeptName(columns[1]);
            }else{
                //emp.txt
                int num=0;
                String[] actural=new String[4];
                //担心空格的切分会超过四个元素
                for (String column : columns) {
                    if(null==column || "".equals(column)){
                        continue;
                    }
                    actural[num]=column;
                    num++;
                }
                ed.setName(actural[0]);
                ed.setGender(actural[1]);
                ed.setAge(Integer.valueOf(actural[2]));
                ed.setDeptNo(Integer.valueOf(actural[3]));
            }
            System.out.println(ed);
            context.write(new IntWritable(ed.getDeptNo()),ed);
        }

    }
}

Reducer代码

public class JoinRedcuer extends Reducer<IntWritable,EmpDep,EmpDep, NullWritable> {
    //从Shuffle出来的<k2,v2>是<DeptNo,<EmpDep1,EmpDep2...>>,Reducer每次只读取一个
    //<k2,v2>数据,处理完进行下一条。这里利用了Shuffle的聚合功能,将DeptNo一样的行聚合成
    //一行,没有shuffle的处理的话,mapper的一行直接进入reducer的话,是无法完成join的
    @Override
    protected void reduce(IntWritable key, Iterable<EmpDep> values, Context context) throws IOException, InterruptedException {
        EmpDep ed=new EmpDep();
        ArrayList<EmpDep> edList =new ArrayList();
        for (EmpDep value : values) {
            if(null==value.getDeptName()||"".equals(value.getDeptName())){
                EmpDep t=new EmpDep();
                t.setName(value.getName());
                t.setGender(value.getGender());
                t.setAge(value.getAge());
                t.setDeptNo(value.getDeptNo());
                edList.add(t);
            }else{
                 ed.setDeptNo(value.getDeptNo());
                 ed.setDeptName(value.getDeptName());

            }
        }
        for(EmpDep empDep:edList){
            EmpDep tmp=empDep;
            tmp.setDeptName(ed.getDeptName());
            context.write(tmp,NullWritable.get());
        }

    }
}

Mapreduce编程5:Map阶段的Join:包括Map、主程序
**内容:对两个文件进行读取,以对象的方式,完成对象序列化和反序列化,进行对象的传输(对象类需要实现WritableComparable接口,改写compareT(),write(),read()三个接口,分别是比较方法,序列化,反序列化)**要将行的值拆开塞进对象属性里,对象当做key;
对象代码

public class EmpDep implements WritableComparable<EmpDep> {
    private String name;
    private String gender;
    private int age;
    private int deptNo;
    private String deptName;

    @Override
    public int compareTo(EmpDep o) {
        if(null==o){
            return 0;
        }else{
            return this.age-o.age;
        }
    }

    @Override
    public void write(DataOutput out) throws IOException {
        out.writeUTF(name==null?"":name);
        out.writeUTF(gender==null?"":gender);
        out.writeInt(age);
        out.writeInt(deptNo);
        out.writeUTF(deptName==null?"":deptName);
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        name=in.readUTF();
        gender=in.readUTF();
        age=in.readInt();
        deptNo=in.readInt();
        deptName=in.readUTF();
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public String getGender() {
        return gender;
    }

    public void setGender(String gender) {
        this.gender = gender;
    }

    public int getAge() {
        return age;
    }

    public void setAge(int age) {
        this.age = age;
    }

    @Override
    public String toString() {
        return "EmpDep{" +
                "name='" + name + '\'' +
                ", gender='" + gender + '\'' +
                ", age=" + age +
                ", deptNo=" + deptNo +
                ", deptName='" + deptName + '\'' +
                '}';
    }

    public String getDeptName() {
        return deptName;
    }

    public void setDeptName(String deptName) {
        this.deptName = deptName;
    }

    public int getDeptNo() {
        return deptNo;
    }

    public void setDeptNo(int deptNo) {
        this.deptNo = deptNo;
    }
}

Map代码

public class TestMapperJoin extends Mapper<LongWritable, Text,EmpDep, NullWritable> {
    Map<Integer,String> depMap=new HashMap<>();
    @Override
    protected void setup(Context context) throws IOException, InterruptedException {
        URI uri = context.getCacheFiles()[0];
        FileReader fis=new FileReader(uri.getPath());
        BufferedReader br=new BufferedReader(fis);
        String line = null;
        while(null!=(line=br.readLine())){
            String[] columns = line.split(" ");
            depMap.put(Integer.valueOf(columns[0]),columns[1]);
        }
    }

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        EmpDep ed =new EmpDep();
        String[] columns = value.toString().split(" ");
        ed.setName(columns[0]);
        ed.setGender(columns[1]);
        ed.setAge(Integer.parseInt(columns[2]));
        ed.setDeptNo(Integer.parseInt(columns[3]));
        Set<Integer> depNos = depMap.keySet();
        for (Integer depNo : depNos) {
            if(depNo==ed.getDeptNo()){
                ed.setDeptName(depMap.get(depNo));
            }
        }
    context.write(ed,NullWritable.get());
}
}

主代码

public class MapperJoinDriver {
    public static void main(String[] args) throws Exception{
        Configuration conf=new Configuration();
        Job job=Job.getInstance(conf);
        job.setJarByClass(MapperJoinDriver.class);
        //
        job.setInputFormatClass(TextInputFormat.class);
        TextInputFormat.addInputPath(job,new Path("lcs/EMP.txt"));
        //
        job.setMapperClass(TestMapperJoin.class);
        job.setMapOutputKeyClass(EmpDep.class);
        job.setMapOutputValueClass(NullWritable.class);
        //
        URI[] uris={new URI("lcs/DEP.txt")};
        job.setCacheFiles(uris);
        //
        job.setOutputFormatClass(TextOutputFormat.class);
        TextOutputFormat.setOutputPath(job,new Path("out4"));
        boolean b=job.waitForCompletion(true);
        System.out.println(b?"成功":"失败");
    }
}

计数器:收集作业统计信息
内置计数器:TaskCounter,FileSystemCounter,
FileOutputFormat,FileInputFormat
外置计数器:内置在Map里的,Map执行一次加1

public class PartitionMapper extends Mapper<LongWritable,Text,Text,NullWritable>{
	protected void map(LongWritable key,Text value,Context context)throws Exception{
	Counter counter=context.getCounter("MR_COUNT","MyRecordCounter");
	counter.increment(1L);
	context.write(value,NullWritable.get());
	}
}
public class PartitionReducer extends Reducer<Text,NullWritable,Text,NullWritable>{
	public static enum Counter{
		My_REDUCE_INPUT_RECORDS,MY_REDUCE_INPUT_BYTES
		}
	protected void reduce(Text key,Iterable<NullWritable> values,Context context)throws Exception{
	context.getCounter(Counter.MY_REDUCE_INPUT_RECORDS).increment(1L);
	context.write(value,NullWritable.get());
	}
}

MapReduce序列化和排序
序列化(Serialization)是指把结构化对象转化为字节流;
反序列化(Deserialization)是把字节流转化为结构化对象;
Hadoop的序列化格式Writable,一个类要支持可序列化只需要实现这个接口即可;
Writable的子接口是WritableComparable,这个子接口既可以实现序列化,也可以排序;

public class SortBean implements WritableComparable<SortBean>{
private String word;
private int num;
//get()set()toString()方法未展示
//实现排序(只需要告诉compareTo返回值就行,不用关心底层排序问题)
@Override
public int compareTo(SortBean sortBean){
	int result=this.word.compareTo(sortBean.word);
	if(result == 0){
		return this.num - sortBean.num;
		}
		return result;
	}
	//实现序列化
	@Override
	public void write(DataOutput out)throws IOException{
		out.writeUTF(word);
		out.writeInt(num);
		}
	//实现反序列化
	@Override
	public void readFields(DataInput in) throws Exception{
		this.word=in.readUTF(word);
		this.num=in.readInt(num);

  • 3
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
好的,以下是用Java编写MapReduce程序的基本步骤: 1. 编写Mapper类: Mapper类是MapReduce的第一步,负责将输入数据转换为键值对。通常情况下,Mapper类需要继承自org.apache.hadoop.mapreduce.Mapper类,并重写map方法。map方法的输入为键值对,输出为键值对,可以使用context.write方法将结果写入到输出中。 例如,假设我们有一些文本文件,需要计算每个单词出现的次数,我们可以编写如下的Mapper类: ``` public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } } ``` 2. 编写Reducer类: Reducer类是MapReduce的第二步,负责将Mapper的输出进行聚合,生成最终的输出。通常情况下,Reducer类需要继承自org.apache.hadoop.mapreduce.Reducer类,并重写reduce方法。reduce方法的输入为键值对的迭代器,输出为键值对,可以使用context.write方法将结果写入到输出中。 例如,我们可以编写如下的Reducer类: ``` public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } ``` 3. 配置Job: 在编写MapReduce程序之前,需要进行一些配置,以便Hadoop可以正确地运行MapReduce作业。通常情况下,需要创建一个org.apache.hadoop.mapreduce.Job对象,并设置输入和输出路径,Mapper和Reducer类,以及一些其他的配置。 例如,我们可以编写如下的Job配置: ``` Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(WordCountMapper.class); job.setCombinerClass(WordCountReducer.class); job.setReducerClass(WordCountReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); ``` 4. 运行Job: 最后一步是运行Job。通常情况下,可以使用job.waitForCompletion方法提交Job,并等待作业完成。 例如,我们可以编写如下的Main方法: ``` public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(WordCountMapper.class); job.setCombinerClass(WordCountReducer.class); job.setReducerClass好的,下面是一个使用Java编写MapReduce程序的示例: 首先,我们需要定义Mapper和Reducer类。Mapper类负责将输入数据分解成一个个key-value对,而Reducer类则负责将Mapper输出的key-value对合并成最终的结果。 下面是一个简单的Mapper类的示例代码: ```java import java.io.IOException; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); String[] words = line.split(" "); for (String word : words) { this.word.set(word); context.write(this.word, one); } } } ``` 这个Mapper类用于对输入的文本数据进行单词计数。它将每一行文本数据拆分成多个单词,并输出一个key-value对,其中key为单词,value为1。 下面是一个简单的Reducer类的示例代码: ```java import java.io.IOException; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable value : values) { sum += value.get(); } context.write(key, new IntWritable(sum)); } } ``` 这个Reducer类用于将Mapper输出的key-value对进行合并。它将相同key的value值累加起来,最终输出一个key-value对,其中key为单词,value为该单词在输入文本中出现的次数。 接下来,我们需要编写一个Main函数,用于启动MapReduce任务。下面是一个简单的Main函数的示例代码: ```java import org.apache.hadoop.fs.Path; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.err.println("Usage: WordCount <input path> <output path>"); System.exit(-1); } Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(WordCountMapper.class); job.setReducerClass(WordCountReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } } ``` 这个Main函数用于设置MapReduce任务的各项 我可以为您提供一些有关如何使用Java编写MapReduce程序的指导。好的,以下是使用Java编写MapReduce程序的基本步骤: 1. 实现Mapper类 Mapper类负责将输入数据分成一系列的键值对(key-value pair),其中key表示数据的某个属性,value则表示该属性对应的值。 例如,假设我们的输入是一些单词组成的文本文件,那么Mapper类的主要工作就是将每个单词作为key,将其出现的次数作为value。Mapper类通常需要继承自org.apache.hadoop.mapreduce.Mapper类,并实现其map()方法。 以下是一个示例Mapper类的代码: ``` import java.io.IOException; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); String[] words = line.split("\\s+"); for (String word : words) { this.word.set(word); context.write(this.word, one); } } } ``` 在这个例子中,我们将输入数据中的每个单词作为key,并将其出现的次数作为value,其中key和value的类型分别为Text和IntWritable。我们使用split()方法将每行文本按照空格分割成单词,并将每个单词和一个常量值1作为一组键值对输出。 2. 实现Reducer类 Reducer类负责将Mapper输出的一组键值对进行聚合,以得到最终的输出结果。 例如,如果我们要统计单词出现的总次数,那么Reducer类的主要工作就是将输入的一系列键值对中,具有相同key的所有value相加,并将其作为输出。 Reducer类通常需要继承自org.apache.hadoop.mapreduce.Reducer类,并实现其reduce()方法。 以下是一个示例Reducer类的代码: ``` import java.io.IOException; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable value : values) { sum += value.get(); } context.write(key, new IntWritable(sum)); } } ``` 在这个例子中,我们将Mapper输出的每个单词作为key,将其出现的次数作为value,并使用reduce()方法将具有相同key的value进行求和。输出的key和value的类型仍然为Text和IntWritable。 3. 配置Job对象 Job对象负责将Mapper和Reducer类组合在一起,并指定输入和输出路径等相关参数。Job对象通常需要通过Configuration对象进行配置。 以下是一个示例Job对象的代码: ``` import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path好的,以下是一个用Java编写MapReduce程序的示例: ``` import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class WordCount { public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{ private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } } ``` 这个程序实现了一个简单的词频统计功能。主要包含三个部分: 1. Mapper:将输入文件中的每个单词转换成键值对,键为单词,值为1。 2. Reducer:将Mapper输出的键值对按照键进行分组,并将每组键值对的值相加得到该单词的出现次数。 3. Main函数:设置输入输出路径、指定Mapper和Reducer类等。 请注意,在运行这个程序之前,需要先安装和配置Hadoop环境,并将程序打包成jar包,然后使用以下命令在Hadoop集群上运行: ``` hadoop jar wordcount.jar input_dir output_dir ``` 其中,`wordcount.jar`为打包好的程序jar包,`input_dir`为输入文件路径,`output_dir`为输出文件路径。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值