MapReduce初体验

MapReduce计算框架

在这里插入图片描述

并行计算框架

一个大的任务拆分成多个小任务,将多个小任务分发到多个节点上。每个节点同时执行计算。

在这里插入图片描述

MapReduce核心思想

分而治之,先分后和:将一个大的、复杂的工作或任务,拆分成多个小的任务,并行处理,最终进行合并。
MapReduce由Map和Reduce组成
Map: 将数据进行拆分
Reduce:对数据进行汇总

在这里插入图片描述

WordCount计算

统计单词出现的总次数

原始数据

zhangsan,lisi,wangwu
zhaoliu,maqi
zhangsan,zhaoliu,wangwu
lisi,wangwu

期望的最终

zhangsan 2
lisi 2
wangwu 3
zhaoliu 2
maqi 1

WordCountMap类实现

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

public class WordCountMap extends Mapper<LongWritable, Text, Text, LongWritable> {

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        //1 将Text的Value 转换成string
        String datas = value.toString();
        //2 使用“,”对数据进行切分
        String[] splits = datas.split(",");
        //3 遍历每一个单词,进行输出(一个单词输出一次)
        for (String split : splits) {
            //输出数据
            context.write(new Text(split), new LongWritable(1));
        }
    }
}

WordCountReduce类实现

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;
public class WordCountReduce extends Reducer<Text, LongWritable, Text, LongWritable> {

    @Override
    protected void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException {
        //key 表示的是图形
        //values  表示的是好多1
        long sum = 0;
        //求和
        //遍历values,逐一计算求和
        for (LongWritable value : values) {
            sum += value.get();
        }
        //将结果输出
        context.write(key, new LongWritable(sum));
    }
}

WordCountDriver类实现

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class WordCountDriver extends Configured implements Tool {
    @Override
    public int run(String[] args) throws Exception {
    	// 创建本次mr程序的job实例
        Job job = Job.getInstance(new Configuration(), "WordCount");
        job.setInputFormatClass(TextInputFormat.class);
        TextInputFormat.addInputPath(job, new Path("F:\\wordcount\\input\\words5.txt"));
        // 指定本次job的具体mapper reducer实现类
        job.setReducerClass(WordCountReduce.class);
        job.setMapperClass(WordCountMap.class);
        // 指定本次job map阶段的输出数据类型
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(LongWritable.class);
        // 指定本次job reduce阶段的输出数据类型
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(LongWritable.class);
        job.setOutputFormatClass(TextOutputFormat.class);
        TextOutputFormat.setOutputPath(job, new Path("F:\\wordcount\\output\\7"));
        //等待代码执行(返回状态码)
        return job.waitForCompletion(true) ? 0 : 1;
    }

    public static void main(String[] args) throws Exception {
        ToolRunner.run(new WordCountDriver(), args);
    }
}

运行成功

  INFO - session.id is deprecated. Instead, use dfs.metrics.session-id
  INFO - Initializing JVM Metrics with processName=JobTracker, sessionId=
  WARN - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
  WARN - No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
  INFO - Total input paths to process : 1
  INFO - OutputCommitter set in config null
  INFO - Running job: job_local168080594_0001
  INFO - OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
  INFO - Waiting for map tasks
  INFO - Starting task: attempt_local168080594_0001_m_000000_0
  WARN - Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
  INFO -  Using ResourceCalculatorPlugin : null
  INFO - Processing split: file:/F:/wordcount/input/words5.txt:0+72
  INFO - Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  INFO - io.sort.mb = 100
  INFO - data buffer = 79691776/99614720
  INFO - record buffer = 262144/327680
  INFO - 
  INFO - Starting flush of map output
  INFO - Finished spill 0
  INFO - Task:attempt_local168080594_0001_m_000000_0 is done. And is in the process of commiting
  INFO - 
  INFO - Task 'attempt_local168080594_0001_m_000000_0' done.
  INFO - Finishing task: attempt_local168080594_0001_m_000000_0
  INFO - Map task executor complete.
  WARN - Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
  INFO -  Using ResourceCalculatorPlugin : null
  INFO - 
  INFO - Merging 1 sorted segments
  INFO - Down to the last merge-pass, with 1 segments left of total size: 172 bytes
  INFO - 
  INFO - Task:attempt_local168080594_0001_r_000000_0 is done. And is in the process of commiting
  INFO - 
  INFO - Task attempt_local168080594_0001_r_000000_0 is allowed to commit now
  INFO - Saved output of task 'attempt_local168080594_0001_r_000000_0' to F:/wordcount/output/7
  INFO - reduce > reduce
  INFO - Task 'attempt_local168080594_0001_r_000000_0' done.
  INFO -  map 100% reduce 100%
  INFO - Job complete: job_local168080594_0001
  INFO - Counters: 17
  INFO -   File System Counters
  INFO -     FILE: Number of bytes read=628
  INFO -     FILE: Number of bytes written=339964
  INFO -     FILE: Number of read operations=0
  INFO -     FILE: Number of large read operations=0
  INFO -     FILE: Number of write operations=0
  INFO -   Map-Reduce Framework
  INFO -     Map input records=4
  INFO -     Map output records=10
  INFO -     Map output bytes=150
  INFO -     Input split bytes=100
  INFO -     Combine input records=0
  INFO -     Combine output records=0
  INFO -     Reduce input groups=5
  INFO -     Reduce shuffle bytes=0
  INFO -     Reduce input records=10
  INFO -     Reduce output records=5
  INFO -     Spilled Records=20
  INFO -     Total committed heap usage (bytes)=506462208

文件查看是否成功
在这里插入图片描述
在这里插入图片描述

—————————————————————————————————————————————

  • 4
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值