Mapreduce（Java程序编写）

最新推荐文章于 2024-07-25 09:48:00 发布

非淡泊无以明志，非宁静无以致远—孔明

最新推荐文章于 2024-07-25 09:48:00 发布

阅读量1.9k

点赞数 3

文章标签： hadoop 大数据

本文链接：https://blog.csdn.net/weixin_53788274/article/details/114989370

版权

Mapreduce：分布式计算框架
开发人员要做的事情：实现Map和Reduce函数

在这里插入图片描述
一般只调用HDFS的话，不实际Yarn的工作，调用Mapreduce时才会调用yarn

在这里插入图片描述

三台设备Mapreduce详细过程
在这里插入图片描述

在这里插入图片描述

Mapreduce编程规范
MapReduce的开发一共有八个步骤，其中Map阶段分为2个步骤，huffle阶段分为4个步骤，
Reduce阶段分为2个步骤；
Map阶段2个步骤：
1.设置InputFormat类，将数据切分为Key-Value（K1和V1）（即<每个行号的偏移量，行内容>）对，输入到第2步
2.自定义Map逻辑，将第一步的结果转换成另外的Key-Value<K2，V2>对（<单词，出现次数>），输出结果
Shuffle阶段4个步骤：
3.对输出的Key-Value对进行分区
4.对不同分区的数据按照相同的Key排序
5.（可选）对分组过的数据初步规约，降低数据的网络拷贝
6.对数据进行分组，相同Key的Value放入一个集合中，还是<K2，V2>（<单词，出现次数的集合>）
Reduce阶段2个步骤：
7.对多个Map任务的结果进行排序以及合并，编写Reduce函数实现自己的逻辑，对输入的Key-Value进行处理，转为新的Key-Value（K3和V3）输出
8.设置OutputFormat处理并保存Reduce输出的Key-Value数据
在这里插入图片描述

在这里插入图片描述
Mapreduce编程1：（只包含Map程序和Reduce程序）
内容：对一个文本作单词计数
更改jdk1.7变成jdk1.8
junit 4.11改成4.12

将a.txt上传到hdfs上
新建Maven文件，导入坐标依赖
org.apache.hadoop hadoop-common 2.6.0 org.apache.hadoop hadoop-hdfs 2.6.0 org.apache.hadoop hadoop-auth 2.6.0 org.apache.hadoop hadoop-client 2.6.0 org.apache.hadoop hadoop-mapreduce-client-core 2.6.0 org.apache.hadoop hadoop-mapreduce-client-jobclient 2.6.0

3. 编写Map程序

public class WCMapper extends Mapper<LongWritable, Text, Text, LongWritable>{
//Mapper的四个泛型相当于<Long, String, String, Long>，只不过hadoop认为Java的类型太不方便，所以改写了一下
//重写Map方法，Map方法就是将K1和V1转为K2和V2
/*参数：key：K1 每行偏移量
			vlaue：V1   每一行的文本数据
			context：表示上下文对象，连接Map--shuffle--duce各个步骤的
*/
/*
	如何将K1和V1转化成K2和V2
	K1					V1
	0		hello,world,hadoop
	15		hdfs,hive,hello
	----------------------------------
	K2					V2
	hello					1
	world					1
	hdfs						1
	hadoop				1
	hello					1
*/
protected void map(LongWritable key, Text value, Context context） throws IOException InterruptedException{
Text text=new Text();
LongWritable longWritable=new LongWritable();
// 1.将一行的文本数据进行拆分（value是Text类型的，没有split方法，要转换成String类型）
String[]split=value.toString().split();
// 2.遍历数组，组装K2和V2
for(String word:split){
// 3.将K2和V2写入上下文context中（context的write方法只能写入Text和LongWritable里，需要类型转换）
text.set(word);
longWritable.set(1);
context.write(text,longWritable);`在这里插入代码片`
}
}

Reduce阶段

//四个泛型代表K2和V2，K3和V3类型
public class WCReducer extends Reducer<Text,IntWritable,Text,IntWritable>{
protected void reduce(Text key, Iterable<IntWritable>values,Context context)throws IOException{
long count=0;
//1、遍历集合，将集合中的数字相加，得到V3
for(LongWritable value:values){
count+=value.get();
}
//2.将K3和V3写入上下文Context中
context.write(key,new IntWritable(count));
}
}

主类：

public class WCDriver{
	public static void main(String [] args) throws Exception{
//job建立设置
	Configuration configuraiton=new Configuration();
	Job job=null;
		job=Job.getInstance(conf,"wc");
		job.setJarByClass(WCDriver.class);
//job配置八大步骤：
		//1.输入设置
	    job.setInputFormatClass(TextInputFormat.class);
		TextInpuFormat.addInputPath(job,new Path("a.txt"));
		//2.map设置
		job.setMapperClass(WCDMapper.class);
		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(IntWritable.class);
		//3,4,5,6.shuffle阶段（分区，排序，规约，分组）
		//7.reduce设置
		job.setReducerClass(WCReducer.class);		
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(IntWritable.class);
		//8.输入输出设置
		job.setOutputFormatClass(TextOutputFormat.class);
		TextOutputFormat.setOutputPath(job,new Path("out1"));
//job结束设置
		boolean flag=job.waitForCompletion(true);
		System.out.print(flag?"成功","失败");
}
}

形成jar包，在Linux上运行jar包。hadoop jar +jar包+程序全名称l；
（Hadoop将jar包分发给各个datanode，运行结果汇总，输出后写入文件中）

打包：第一种方式：projectstructure—Artifacts—±–JAR—frommodulesbydependencies—选择程序—OK—build…方式
第二种方式：右侧的Mavenprojects—lifecycle—clean—package—下面会显示jar包的位置

Mapreduce编程2：包括Map、Shuffle、Reduce程序—（shuffle里的分区）

**内容：对才彩票信息分类（第6列数据大于15和小于等于15分类）**要从Map到Reduce都将行当做key
分区：给Map Task的执行结果（每个<k2,v2>）打上标记，有相同标记的值去往同一个Reduce Task里

Map代码：

/*K1 :行偏移量	LongWritable
  V1 :行文本数据	Text

  K2 :行文本数据 Text
  V2 : NullWritable（我们最终只要给彩票信息分类，行数据给K2就行了，不需要V2这个数据，所以当做占位符就行了）
*/
public class PartitinoMapper extendds Mapper<LongWritable,Text,Text,NullWritable>{
	//map方法将K1和V1转为K2和V2
	@Override
	protected void map(LongWritbale key ,Text value, Context)throws IOException,InterruptedException{
	    context.write(value,NullWritable.get());
   }
}

分区代码：

/*定义分区规则
  返回对应的分区编号
*/
public class MyPartitioner extends Partitioner<Text,NullWritable>{
	@Override
	public int getPartition(Text text,NullWritable nullWritable, int i){
	//1.拆分行文本数据（K2），获取中奖字段的值
	String[] spllit = text.toString().split("\t");
	String numStr = split[5];
	//2.判断中奖字段的值和15的关系，然后返回对应的分区编号
	if(Integer.parseInt(numStr)>15){
	return 1;
	}else {
	return 0;
	}
}

Reduce代码：

/*定义分区规则
  返回对应的分区编号
*/
public class PartitionerReducer extends Reducer<Text,NullWritable,Text,NullWritable>{
	@Override
	protected void reduce(Text key, Iterable<NullWritable> values,Context context)throws IOException,InterrupedException{
	context.write(key,NullWritable.get());
	}
}

主代码：

/*
*/
public class JobMain extends Configured implements Tool{
	@Override
	public int run(String[]args) throws Exception{
	//1.创建job任务对象(configuration对象，随便起个名字）
	Job job=Job.getInstance(super.getConf(),"partition_maperduce");
	//2.对job任务进行分配（共八步）
	//第一步：设置输入的类和输入的路径
	job.setInputFormatClass(TextInputFormat.class);
	TextInputFormat.addInputPath(job,new Path("hdfs://hadoop100:8020/intput"));
	//第二部：设置Mapper类和数据类型
	job.setMapperClass(PartitionMapper.class);
	job.setMapOutputKeyClass(Text.class);
	job.setMapOutputValueClass(NullWritable.class);
	//第三步：指定分区类
	job.setPartitionerClass(MyPartitioner.class);
	//第四、五、六步
	//第七步：指定Reducer类和数据类型（K3和V3）
	job.setReducerClass(PartitionerReducer.class);
	job.setOutputKeyClass(Text.class);
	job.setOutputValueClass(NullWritable.class);
	//第八步：指定输出类和输出路径
	job.setOutputFormatClass(TextOutputFormat.class);
	TextOutputFormat.setOutputPath(job,new Path("hdfs://hadoop100:8020/out/partiton_out"));
	//3.等待任务结束
	boolean b1=job.waitForCompletion(true);
	return b1? 0: 1;
	}
	public static void main(String[] args){
	Configuration configuration=new Configuration();
	//启动job任务
	int run = ToolRunner.run(configuration,new JobMain(),args);
	System.exit(run);
}

Mapreduce编程3：包括Map、程序-----shuffle里的排序

**内容：对两个文件进行读取，以对象的方式，完成对象序列化和反序列化，进行对象的传输（对象类需要实现WritableComparable接口，改写compareT(),write(),read()三个接口，分别是比较方法，序列化，反序列化）**要将行的值拆开塞进对象属性里，对象当做key；
对象代码

public class EmpDep implements WritableComparable<EmpDep> {
    private String name;
    private String gender;
    private int age;
    private int deptNo;
    private String deptName;

    @Override
    public int compareTo(EmpDep o) {
        if(null==o){
            return 0;
        }else{
            return this.age-o.age;
        }
    }

    @Override
    public void write(DataOutput out) throws IOException {
        out.writeUTF(name==null?"":name);
        out.writeUTF(gender==null?"":gender);
        out.writeInt(age);
        out.writeInt(deptNo);
        out.writeUTF(deptName==null?"":deptName);
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        name=in.readUTF();
        gender=in.readUTF();
        age=in.readInt();
        deptNo=in.readInt();
        deptName=in.readUTF();
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public String getGender() {
        return gender;
    }

    public void setGender(String gender) {
        this.gender = gender;
    }

    public int getAge() {
        return age;
    }

    public void setAge(int age) {
        this.age = age;
    }

    @Override
    public String toString() {
        return "EmpDep{" +
                "name='" + name + '\'' +
                ", gender='" + gender + '\'' +
                ", age=" + age +
                ", deptNo=" + deptNo +
                ", deptName='" + deptName + '\'' +
                '}';
    }

    public String getDeptName() {
        return deptName;
    }

    public void setDeptName(String deptName) {
        this.deptName = deptName;
    }

    public int getDeptNo() {
        return deptNo;
    }

    public void setDeptNo(int deptNo) {
        this.deptNo = deptNo;
    }
}

Map代码

public class JoinMapper extends Mapper<LongWritable, Text,EmpDep, NullWritable> {
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String s = value.toString();
        //去除空行
        if(null!=s &&!"".equals(s)){
            //使用空字符为分隔符
            String[] columns = s.split("\\s");
            EmpDep ed=new EmpDep();
          //  for (String column : columns) {
                if(columns.length==2){
                    //dep.txt
                    ed.setDeptNo(Integer.valueOf(columns[0]));
                    ed.setDeptName(columns[1]);
                }else{
                    //emp.txt
                    int num=0;
                    String[] actural=new String[4];
                    //担心空格的切分会超过四个元素
                    for (String column : columns) {
                        if(null==column || "".equals(column)){
                            continue;
                        }
                        actural[num]=column;
                        num++;
                    }
                    ed.setName(actural[0]);
                    ed.setGender(actural[1]);
                    ed.setAge(Integer.valueOf(actural[2]));
                    ed.setDeptNo(Integer.valueOf(actural[3]));
                }
            //}
            context.write(ed,NullWritable.get());
        }

    }
}

主代码

public class JoinDriver {
    public static void main(String[] args) throws Exception{
        Configuration conf=new Configuration();
        Job job=Job.getInstance(conf,"ww");
        job.setJarByClass(JoinDriver.class);
        //
        job.setInputFormatClass(TextInputFormat.class);
        TextInputFormat.setInputPaths(job,new Path("lcs/"));

        job.setMapperClass(JoinMapper.class);
        job.setMapOutputKeyClass(EmpDep.class);
        job.setMapOutputValueClass(NullWritable.class);

        job.setOutputKeyClass(EmpDep.class);
        job.setOutputValueClass(NullWritable.class);
        job.setOutputFormatClass(TextOutputFormat.class);
        TextOutputFormat.setOutputPath(job,new Path("out3"));

        boolean b1=job.waitForCompletion(true);
        System.out.println(b1?"成功":"失败");
    }
}

Mapreduce编程4：Reducer阶段的Join：包括Map、Reducer、主程序

public class EmpDep implements WritableComparable<EmpDep> {
    private String name;
    private String gender;
    private int age;
    private int deptNo;
    private String deptName;

    @Override
    public int compareTo(EmpDep o) {
        if(null==o){
            return 0;
        }else{
            return this.age-o.age;
        }
    }

    @Override
    public void write(DataOutput out) throws IOException {
        out.writeUTF(name==null?"":name);
        out.writeUTF(gender==null?"":gender);
        out.writeInt(age);
        out.writeInt(deptNo);
        out.writeUTF(deptName==null?"":deptName);
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        name=in.readUTF();
        gender=in.readUTF();
        age=in.readInt();
        deptNo=in.readInt();
        deptName=in.readUTF();
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public String getGender() {
        return gender;
    }

    public void setGender(String gender) {
        this.gender = gender;
    }

    public int getAge() {
        return age;
    }

    public void setAge(int age) {
        this.age = age;
    }

    @Override
    public String toString() {
        return "EmpDep{" +
                "name='" + name + '\'' +
                ", gender='" + gender + '\'' +
                ", age=" + age +
                ", deptNo=" + deptNo +
                ", deptName='" + deptName + '\'' +
                '}';
    }

    public String getDeptName() {
        return deptName;
    }

    public void setDeptName(String deptName) {
        this.deptName = deptName;
    }

    public int getDeptNo() {
        return deptNo;
    }

    public void setDeptNo(int deptNo) {
        this.deptNo = deptNo;
    }
}

Map代码

public class JoinMapper extends Mapper<LongWritable, Text, IntWritable,EmpDep> {
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String s = value.toString();
        //去除空行
        if(null!=s &&!"".equals(s)){
            //使用空字符为分隔符
            String[] columns = s.split(" ");
            EmpDep ed=new EmpDep();
            //  for (String column : columns) {
            if(columns.length==2){
                //dep.txt
                ed.setDeptNo(Integer.valueOf(columns[0]));
                ed.setDeptName(columns[1]);
            }else{
                //emp.txt
                int num=0;
                String[] actural=new String[4];
                //担心空格的切分会超过四个元素
                for (String column : columns) {
                    if(null==column || "".equals(column)){
                        continue;
                    }
                    actural[num]=column;
                    num++;
                }
                ed.setName(actural[0]);
                ed.setGender(actural[1]);
                ed.setAge(Integer.valueOf(actural[2]));
                ed.setDeptNo(Integer.valueOf(actural[3]));
            }
            System.out.println(ed);
            context.write(new IntWritable(ed.getDeptNo()),ed);
        }

    }
}

Reducer代码

public class JoinRedcuer extends Reducer<IntWritable,EmpDep,EmpDep, NullWritable> {
    //从Shuffle出来的<k2,v2>是<DeptNo,<EmpDep1,EmpDep2...>>,Reducer每次只读取一个
    //<k2,v2>数据，处理完进行下一条。这里利用了Shuffle的聚合功能，将DeptNo一样的行聚合成
    //一行，没有shuffle的处理的话，mapper的一行直接进入reducer的话，是无法完成join的
    @Override
    protected void reduce(IntWritable key, Iterable<EmpDep> values, Context context) throws IOException, InterruptedException {
        EmpDep ed=new EmpDep();
        ArrayList<EmpDep> edList =new ArrayList();
        for (EmpDep value : values) {
            if(null==value.getDeptName()||"".equals(value.getDeptName())){
                EmpDep t=new EmpDep();
                t.setName(value.getName());
                t.setGender(value.getGender());
                t.setAge(value.getAge());
                t.setDeptNo(value.getDeptNo());
                edList.add(t);
            }else{
                 ed.setDeptNo(value.getDeptNo());
                 ed.setDeptName(value.getDeptName());

            }
        }
        for(EmpDep empDep:edList){
            EmpDep tmp=empDep;
            tmp.setDeptName(ed.getDeptName());
            context.write(tmp,NullWritable.get());
        }

    }
}

Mapreduce编程5：Map阶段的Join：包括Map、主程序
**内容：对两个文件进行读取，以对象的方式，完成对象序列化和反序列化，进行对象的传输（对象类需要实现WritableComparable接口，改写compareT(),write(),read()三个接口，分别是比较方法，序列化，反序列化）**要将行的值拆开塞进对象属性里，对象当做key；
对象代码

public class EmpDep implements WritableComparable<EmpDep> {
    private String name;
    private String gender;
    private int age;
    private int deptNo;
    private String deptName;

    @Override
    public int compareTo(EmpDep o) {
        if(null==o){
            return 0;
        }else{
            return this.age-o.age;
        }
    }

    @Override
    public void write(DataOutput out) throws IOException {
        out.writeUTF(name==null?"":name);
        out.writeUTF(gender==null?"":gender);
        out.writeInt(age);
        out.writeInt(deptNo);
        out.writeUTF(deptName==null?"":deptName);
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        name=in.readUTF();
        gender=in.readUTF();
        age=in.readInt();
        deptNo=in.readInt();
        deptName=in.readUTF();
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public String getGender() {
        return gender;
    }

    public void setGender(String gender) {
        this.gender = gender;
    }

    public int getAge() {
        return age;
    }

    public void setAge(int age) {
        this.age = age;
    }

    @Override
    public String toString() {
        return "EmpDep{" +
                "name='" + name + '\'' +
                ", gender='" + gender + '\'' +
                ", age=" + age +
                ", deptNo=" + deptNo +
                ", deptName='" + deptName + '\'' +
                '}';
    }

    public String getDeptName() {
        return deptName;
    }

    public void setDeptName(String deptName) {
        this.deptName = deptName;
    }

    public int getDeptNo() {
        return deptNo;
    }

    public void setDeptNo(int deptNo) {
        this.deptNo = deptNo;
    }
}

Map代码

public class TestMapperJoin extends Mapper<LongWritable, Text,EmpDep, NullWritable> {
    Map<Integer,String> depMap=new HashMap<>();
    @Override
    protected void setup(Context context) throws IOException, InterruptedException {
        URI uri = context.getCacheFiles()[0];
        FileReader fis=new FileReader(uri.getPath());
        BufferedReader br=new BufferedReader(fis);
        String line = null;
        while(null!=(line=br.readLine())){
            String[] columns = line.split(" ");
            depMap.put(Integer.valueOf(columns[0]),columns[1]);
        }
    }

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        EmpDep ed =new EmpDep();
        String[] columns = value.toString().split(" ");
        ed.setName(columns[0]);
        ed.setGender(columns[1]);
        ed.setAge(Integer.parseInt(columns[2]));
        ed.setDeptNo(Integer.parseInt(columns[3]));
        Set<Integer> depNos = depMap.keySet();
        for (Integer depNo : depNos) {
            if(depNo==ed.getDeptNo()){
                ed.setDeptName(depMap.get(depNo));
            }
        }
    context.write(ed,NullWritable.get());
}
}

主代码

public class MapperJoinDriver {
    public static void main(String[] args) throws Exception{
        Configuration conf=new Configuration();
        Job job=Job.getInstance(conf);
        job.setJarByClass(MapperJoinDriver.class);
        //
        job.setInputFormatClass(TextInputFormat.class);
        TextInputFormat.addInputPath(job,new Path("lcs/EMP.txt"));
        //
        job.setMapperClass(TestMapperJoin.class);
        job.setMapOutputKeyClass(EmpDep.class);
        job.setMapOutputValueClass(NullWritable.class);
        //
        URI[] uris={new URI("lcs/DEP.txt")};
        job.setCacheFiles(uris);
        //
        job.setOutputFormatClass(TextOutputFormat.class);
        TextOutputFormat.setOutputPath(job,new Path("out4"));
        boolean b=job.waitForCompletion(true);
        System.out.println(b?"成功":"失败");
    }
}

计数器：收集作业统计信息
内置计数器：TaskCounter,FileSystemCounter,
FileOutputFormat,FileInputFormat
外置计数器：内置在Map里的，Map执行一次加1

public class PartitionMapper extends Mapper<LongWritable,Text,Text,NullWritable>{
	protected void map(LongWritable key,Text value,Context context)throws Exception{
	Counter counter=context.getCounter("MR_COUNT","MyRecordCounter");
	counter.increment(1L);
	context.write(value,NullWritable.get());
	}
}
public class PartitionReducer extends Reducer<Text,NullWritable,Text,NullWritable>{
	public static enum Counter{
		My_REDUCE_INPUT_RECORDS,MY_REDUCE_INPUT_BYTES
		}
	protected void reduce(Text key,Iterable<NullWritable> values,Context context)throws Exception{
	context.getCounter(Counter.MY_REDUCE_INPUT_RECORDS).increment(1L);
	context.write(value,NullWritable.get());
	}
}

MapReduce序列化和排序
序列化（Serialization）是指把结构化对象转化为字节流；
反序列化（Deserialization）是把字节流转化为结构化对象；
Hadoop的序列化格式Writable，一个类要支持可序列化只需要实现这个接口即可；
Writable的子接口是WritableComparable，这个子接口既可以实现序列化，也可以排序；

public class SortBean implements WritableComparable<SortBean>{
private String word;
private int num;
//get()set()toString()方法未展示
//实现排序（只需要告诉compareTo返回值就行，不用关心底层排序问题）
@Override
public int compareTo(SortBean sortBean){
	int result=this.word.compareTo(sortBean.word);
	if(result == 0){
		return this.num - sortBean.num;
		}
		return result;
	}
	//实现序列化
	@Override
	public void write(DataOutput out)throws IOException{
		out.writeUTF(word);
		out.writeInt(num);
		}
	//实现反序列化
	@Override
	public void readFields(DataInput in) throws Exception{
		this.word=in.readUTF(word);
		this.num=in.readInt(num);