MapReduces小案例的实现

 

 

1, 什么是MapReduces:

   

MapReduce是一种编程模型,用于大规模数据集(大于1TB)的并行运算。概念"Map(映射)""Reduce(归约)",是它们的主要思想,都是从函数式编程语言里借来的,还有从矢量编程语言里借来的特性。它极大地方便了编程人员在不会分布式并行编程的情况下,将自己的程序运行在分布式系统上。 当前的软件实现是指定一个Map(映射)函数,用来把一组键值对映射成一组新的键值对,指定并发的Reduce(归约)函数,用来保证所有映射的键值对中的每一个共享相同的键组。

 

简单的小案例:

 

小案例1:

 

案例介绍:本次案例主要完成的有一个文件,存储的若干字符产,创建MapReduces来计算文件中各个字符串出现的次数;

 

代码:

package xja.com;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordConunt {

    public static void main(String[] args)throws Exception {
       
        Path inpath = new Path(args[0]);
        Path outpath = new Path(args[1]);
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf);
        job.setJarByClass(WordConunt.class);
        job.setJobName("WordConunt");
        job.setMapperClass(Map.class);
        job.setReducerClass(Red.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, inpath);
        FileOutputFormat.setOutputPath(job, outpath);
        job.waitForCompletion(true);
    }

   
    public static class Map extends Mapper<LongWritable,Text,Text,IntWritable>{
        public void map(LongWritable key,Text value,Context context) throws IOException,InterruptedException {
            String[] line = value.toString().split(" ");
            Text keyy;
            IntWritable valuee = new IntWritable(1);
            for(int i=0; i<line.length; i++){
                keyy = new Text(line[i]);
                context.write(keyy, valuee);
            }
        }      
    }


    public static class Red extends Reducer<Text,IntWritable,Text,IntWritable>{
        public void reduce(Text key,Iterable<IntWritable> value,Context context) throws IOException,InterruptedException {          
            int count = 0;
            for(IntWritable val : value){
                count = count+val.get();
            }
            context.write(key, new IntWritable(count));
        }  
    }
}

 

 

 

1,创建目录:       命令:

[root@quickstart /]# hadoop fs -mkdir -p /data/wordcount/input
[root@quickstart /]# hadoop fs -mkdir -p /data/wordcount/output
      

 

2,本地创建一个文件,随机写入一些字符串:

      

命令:[root@quickstart cloudera]# gedit wang.txt

       随机输入一些字符串用于测试使用:

      

3,将文件上传到hdfs文件系统总的/data/wordcount/input中:

      

 命令:[root@quickstart cloudera]# hadoop fs -put wang.txt /data/wordcount/input

  

      

4,运行MapReduces代码:

      

命令:[root@quickstart cloudera]# hadoop jar WordConunt.jar /data/wordcount/input /data/wordcount/output

  

 

 

 

5,查看运行的结果:

       在目录下多了output目录:

  

查看output目录有什么文件:

 

查看结果文件:

命令:[root@quickstart /]# hadoop fs -cat /data/wordcount/output/*

 

 

 

 

 

 

 

小案例2:

 

案例介绍:给我一个存储了若干电话号码的文件,,主要参数有电话号码,

电话号码

上行流量

下行流量

13726230501

200

1100

13396230502

300

1200

13898205030

400

1300

13897230503

100

300

13597230543

500

1400

13597230534

300

1200

编写一个MapReduces代码,实现计算每个号码的上行流量和下行流量的存储,以及两个流量的总和。

 

案例步骤:

1, 首先创建一个phone的txt文件:

   

命令:[root@quickstart cloudera]# gedit phone.txt

      

      

    录入数据,中间用‘|’隔开(分隔符自选)

 

文件内容:[root@quickstart cloudera]# gedit phone.txt

13726230501|200|1100

13396230502|300|1200

13898205030|400|1300

13897230503|100|300

13597230543|500|1400

13597230534|300|1200

   

 

   

2, 编写Java类:

    代码:   

package xja.com;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class PhoneTest {
   
    public static void main(String[] args) throws Exception{
       
        Path inputPath = new Path(args[0]);
        Path outputPath = new Path(args[1]);
       
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf);

        job.setJarByClass(PhoneTest.class);
        job.setJobName("PhoneTest");
      
        job.setMapperClass(Map.class);
        job.setReducerClass(Red.class);
       
        FileInputFormat.setInputPaths(job, inputPath);
        FileOutputFormat.setOutputPath(job, outputPath);
       
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);
       
        job.waitForCompletion(true);
      

    }
  
    public static class Map extends Mapper<LongWritable,Text,Text,Text>{
        public void map(LongWritable key,Text value,Context context) throws IOException,InterruptedException{
           
            String[] line = value.toString().split("\\|",2);
            context.write(new Text(line[0]), new Text(line[1]));
        }  
    }
   
    public static class Red extends Reducer<Text,Text,Text,Text>{
        public void reduce(Text key,Iterable<Text> value,Context context) throws IOException,InterruptedException{
           
            String[] str;
            int upSum = 0;
            int downSum = 0;
            int totSum = 0;
            for(Text val : value){
                str = val.toString().split("\\|");
                upSum = upSum+Integer.parseInt(str[0]);
                downSum = downSum+Integer.parseInt(str[1]);
            }
            totSum = totSum + upSum + downSum;
            context.write(key, new Text(upSum+","+downSum+","+totSum));
        }
    }
}

    将java打包成PhoneTest.jar 包;

注意:(打包教程请参考博客:

https://blog.csdn.net/qq_34377273/article/details/84328598

 

   

3, 在HDFS上创建文件存储目录:

    这里我在HDFS跟目录下创建了一个/data/phne/input目录用于存放phone.txt文件

   

 命令:[root@quickstart cloudera]# hadoop fs -mkdir -p /data/phone/input

   

   

4, 将存储有信息的phone.txt文件上传到/data/phne/input目录下:   

命令:[root@quickstart cloudera]# hadoop fs -put phone.txt /data/phone/input
上传完查看文件:[root@quickstart cloudera]# hadoop fs -ls /data/phone/input

 

   

5, 运行jar包:

 

   命令:[root@quickstart cloudera]# hadoop jar PhoneTest.jar /data/phone/input /data/phone/output

    (Jar包运行完以后会在HDFS文件系统中生成/data/phone/output目录)

 

   

    Jar包运行完以后查看目录是否生成:

 

   

6, 查看目录下文件内容:

   

 命令:[root@quickstart cloudera]# hadoop fs -cat /data/phone/output/*

   

    执行结果文件:

 

 

案例2的扩展:

一:加入序列化操作:

代码:

1,

实例化操作的类:PhoneWritable.java:

 
package xja.com;
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import org.apache.hadoop.io.Writable;

public class PhoneWritable implements Writable {

    int upFlow;
    int downFlow;
    int totFlow;
  
    public PhoneWritable(){
      
    }
    public PhoneWritable(int upFlow,int downFlow){
        this.upFlow = upFlow;
        this.downFlow = downFlow;
        this.totFlow = upFlow + downFlow;
    }

    public int getUpFlow() {
        return upFlow;
    }

    public void setUpFlow(int upFlow) {
        this.upFlow = upFlow;
    }

    public int getDownFlow() {
        return downFlow;
    }

    public void setDownFlow(int downFlow) {
        this.downFlow = downFlow;
    }

    public int getTotFlow() {
        return totFlow;
    }

    public void setTotFlow(int totFlow) {
        this.totFlow = totFlow;
    }
   
    public void write(DataOutput out) throws IOException{
        out.writeInt(upFlow);
        out.writeInt(downFlow);
        out.writeInt(totFlow);
    }
   
    public void readFields(DataInput in) throws IOException{
       
        upFlow = in.readInt();
        downFlow = in.readInt();
        totFlow = in.readInt();
    }

    @Override
    public String toString() {
        return "PhoneWritable [upFlow=" + upFlow + ", downFlow=" + downFlow
                + ", totFlow=" + totFlow + "]";
    }
}

 

2,

执行类:PhoneTestFlow.java:

package xja.com;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class PhoneTestFlow {
   
    public static void main(String[] args) throws Exception{
       
        Path inputPath = new Path(args[0]);
        Path outputPath = new Path(args[1]);
       
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf);
       
        job.setJarByClass(PhoneTestFlow.class);
        job.setJobName("PhoneTest");
       
        job.setMapperClass(Map.class);
        job.setReducerClass(Red.class);
       
        FileInputFormat.setInputPaths(job, inputPath);
        FileOutputFormat.setOutputPath(job, outputPath);
       
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(PhoneWritable.class);
       
        job.waitForCompletion(true);
       
    }
   
    public static class Map extends Mapper<LongWritable,Text,Text,PhoneWritable>{
        public void map(LongWritable key,Text value,Context context) throws OException,InterruptedException{
           
            String[] line = value.toString().split("\\|");
           
            PhoneWritable fwValue = new honeWritable(Integer.parseInt(line[1]),Integer.parseInt(line[2]));
           
           
           
            context.write(new Text(line[0]), fwValue);
        }  
    }
   
    public static class Red extends Reducer<Text,PhoneWritable,Text,Text>{
        public void reduce(Text key,Iterable<PhoneWritable> value,Context context) throws IOException,InterruptedException{
           
            String[] str;
            int upSum = 0;
            int downSum = 0;
            int totSum = 0;
            for(PhoneWritable val : value){
                //str = val.toString().split("\\|");
                upSum = upSum+val.getUpFlow();
                downSum = downSum+val.getDownFlow();
                //upSum = upSum+Integer.parseInt(str[0]);
                //downSum = downSum+Integer.parseInt(str[1]);
            }
            totSum = totSum + upSum + downSum;
            context.write(key, new Text(upSum+","+downSum+","+totSum));
            }
    }
}

 

将两个类一块打成jar包:PhoneTestFlow.jar

3, 打包完成的jar包:

   

3, 继续引用/data/phone/input/phone.txt的文件,执行jar包:

   

命令:[root@quickstart cloudera]# hadoop jar PhoneTestFlow.jar /data/phone/input /data/phone/output1

 

4, 查看生成的结果:

   

命令:[root@quickstart cloudera]# hadoop fs -cat /data/phone/output1/*

    执行结果文件:

   

 

一在序列化操作的基础上实现分区操作:

代码:

1,

实例化操作的类:PhoneWritable.java:

  
package xja.com;
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import org.apache.hadoop.io.Writable;

public class PhoneWritable implements Writable {

    int upFlow;
    int downFlow;
    int totFlow;
   
    public PhoneWritable(){
       
    }
    public PhoneWritable(int upFlow,int downFlow){
        this.upFlow = upFlow;
        this.downFlow = downFlow;
        this.totFlow = upFlow + downFlow;
    }

    public int getUpFlow() {
        return upFlow;
    }

    public void setUpFlow(int upFlow) {
        this.upFlow = upFlow;
    }

    public int getDownFlow() {
        return downFlow;
    }

    public void setDownFlow(int downFlow) {
        this.downFlow = downFlow;
    }

    public int getTotFlow() {
        return totFlow;
    }

    public void setTotFlow(int totFlow) {
        this.totFlow = totFlow;
    }
   
    public void write(DataOutput out) throws IOException{
        out.writeInt(upFlow);
        out.writeInt(downFlow);
        out.writeInt(totFlow);
    }
   
    public void readFields(DataInput in) throws IOException{
       
        upFlow = in.readInt();
        downFlow = in.readInt();
        totFlow = in.readInt();
    }

    @Override
    public String toString() {
        return "PhoneWritable [upFlow=" + upFlow + ", downFlow=" + downFlow
                + ", totFlow=" + totFlow + "]";
    }
}

 

 

2,

执行类:PhoneTestFlow.java:

package xja.com.test;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class PhoneTestFlow {
   
    public static void main(String[] args) throws Exception{
       
        Path inputPath = new Path(args[0]);
        Path outputPath = new Path(args[1]);
       
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf);
       
        job.setJarByClass(PhoneTestFlow.class);
        job.setJobName("PhoneTest");
       
        job.setMapperClass(Map.class);
        job.setReducerClass(Red.class);
       
        FileInputFormat.setInputPaths(job, inputPath);
        FileOutputFormat.setOutputPath(job, outputPath);
       
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(PhoneWritable.class);
       
        job.setPartitionerClass(MyPartitioner.class);
        job.setNumReduceTasks(5);
       
        job.waitForCompletion(true);
       
    }
   
    public static class Map extends Mapper<LongWritable,Text,Text,PhoneWritable>{
        public void map(LongWritable key,Text value,Context context) throws OException,InterruptedException{
           
            String[] line = value.toString().split("\\|");
           
            PhoneWritable fwValue = new honeWritable(Integer.parseInt(line[1]),Integer.parseInt(line[2]));
           
           
           
            context.write(new Text(line[0]), fwValue);
        }  
    }
   
    public static class Red extends Reducer<Text,PhoneWritable,Text,Text>{
        public void reduce(Text key,Iterable<PhoneWritable> value,Context context) throws IOException,InterruptedException{
           
            String[] str;
            int upSum = 0;
            int downSum = 0;
            int totSum = 0;
            for(PhoneWritable val : value){
                //str = val.toString().split("\\|");
                upSum = upSum+val.getUpFlow();
                downSum = downSum+val.getDownFlow();
                //upSum = upSum+Integer.parseInt(str[0]);
                //downSum = downSum+Integer.parseInt(str[1]);
            }
            totSum = totSum + upSum + downSum;
            context.write(key, new Text(upSum+","+downSum+","+totSum));
            }
    }
}

3,

指定分组操作规范的类:MyPartitioner.java:

package xja.com.test;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Partitioner;

public class MyPartitioner extends Partitioner<Text,PhoneWritable> {
       
    public int getPartition(Text key,PhoneWritable value,int partitionNum){
       
        String phoneAre = key.toString().substring(0,3);
        if("137".equals(phoneAre)){
            return 0;
        } else if("133".equals(phoneAre)) {
            return 1;
        } else if("138".equals(phoneAre)) {
            return 2;
        } else if("135".equals(phoneAre)) {
            return 3;
        } else{
            return 4;
        }
    }
}  

   

   

   

 

将两个类一块打成jar包:PhoneTestFlow.jar

3, 打包完成的jar包:

   

3, 继续引用/data/phone/input/phone.txt的文件,执行jar包:

   

命令:[root@quickstart cloudera]# hadoop jar PartitionerPhoneTestFlow.jar /data/phone/input /data/phone/output2  

 

 

4, 查看生成的结果:   

 命令:[root@quickstart cloudera]# hadoop fs -cat /data/phone/output1/*

[root@quickstart cloudera]# hadoop fs -ls /data/phone/output2 

 

 

 

  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

我是王小贱

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值