MapReduce统计单词个数

最新推荐文章于 2024-04-15 21:15:45 发布

缘不易

最新推荐文章于 2024-04-15 21:15:45 发布

阅读量645

点赞数

分类专栏： Hadoop 文章标签： mapreduce hadoop

本文链接：https://blog.csdn.net/qq_45834006/article/details/105785568

版权

Hadoop 专栏收录该内容

26 篇文章 0 订阅

订阅专栏

测试数据

q w e t y u s hy d g h s g s e w f qw er a fs ds as
da ds sd df gf h g sds we sds sa fd sd sd as df f a w
we ew d fg s gf d h x f e f d sd r sd t ds
sd df f g x w t yu d c s t d d e
q w e t y u s hy d g h s g s e w f qw er a fs ds as
da ds sd df gf h g sds we sds sa fd sd sd as df f a w
we ew d fg s gf d h x f e f d sd r sd t ds
sd df f g x w t yu d c s t d d e
q w e t y u s hy d g h s g s e w f qw er a fs ds as
da ds sd df gf h g sds we sds sa fd sd sd as df f a w
we ew d fg s gf d h x f e f d sd r sd t ds
sd df f g x w t yu d c s t d d e
q w e t y u s hy d g h s g s e w f qw er a fs ds as
da ds sd df gf h g sds we sds sa fd sd sd as df f a w
we ew d fg s gf d h x f e f d sd r sd t ds
sd df f g x w t yu d c s t d d e
q w e t y u s hy d g h s g s e w f qw er a fs ds as
da ds sd df gf h g sds we sds sa fd sd sd as df f a w
we ew d fg s gf d h x f e f d sd r sd t ds
sd df f g x w t yu d c s t d d e
q w e t y u s hy d g h s g s e w f qw er a fs ds as
da ds sd df gf h g sds we sds sa fd sd sd as df f a w
we ew d fg s gf d h x f e f d sd r sd t ds
sd df f g x w t yu d c s t d d e
q w e t y u s hy d g h s g s e w f qw er a fs ds as
da ds sd df gf h g sds we sds sa fd sd sd as df f a w
we ew d fg s gf d h x f e f d sd r sd t ds
sd df f g x w t yu d c s t d d e

1.自定义Mapper

package com.wc;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;
import java.util.StringTokenizer;

public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
   private Text mapOutputKey = new Text();
   private IntWritable mapOutputValue = new IntWritable();
   @Override
   protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
       //1.将读取的文件变成，偏移量+内容//读取一行数据
       String linevalue = value.toString();
       //使用空格分隔
       StringTokenizer st = new StringTokenizer(linevalue);
       //判断是否还有分隔符，有的话代表还有单词
       while (st.hasMoreTokens()) {
           //返回从当前位置到下一个分隔符之间的字符串（单词）
           String word = st.nextToken();
           mapOutputKey.set(word);
           mapOutputValue.set(1);
           context.write(mapOutputKey, mapOutputValue);
       }
   }
}

2.自定义Reduce

package com.wc;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;

public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable outputValue = new IntWritable();
@Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
     int sum = 0;    //汇总
     for (IntWritable value : values) {
     sum += value.get();
     }
     outputValue.set(sum);
     context.write(key, outputValue);
     }
}

3.自定义Driver

package com.wc;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;

public class WordCountDriver {
   public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
       //需要在resources下面提供core-site.xml文件
       args = new String[]{
               "src/main/resources/input2/",
               "src/main/resources/output/"
       };
       //获取配置
       Configuration cfg = new Configuration();
       Job job = Job.getInstance(cfg, WordCountDriver.class.getSimpleName());
       job.setJarByClass(WordCountDriver.class);
       //设置map与需要设置的内容类 + 输出key与value
       job.setMapperClass(WordCountMapper.class);
       job.setMapOutputKeyClass(Text.class);
       job.setMapOutputValueClass(IntWritable.class);
       //设置reduce
       job.setReducerClass(WordCountReducer.class);
       job.setOutputKeyClass(Text.class);
       job.setOutputValueClass(IntWritable.class);
       //设置input与output
       FileInputFormat.addInputPath(job, new Path(args[0]));
       Path op1 = new Path(args[1]);
       FileOutputFormat.setOutputPath(job, op1);
       FileSystem fs = FileSystem.get(cfg);
       if (fs.exists(op1)) {
           fs.delete(op1, true);
           System.out.println("存在此输出路径，已删除！！！");
       }
       //将job交给Yarn
       boolean issucess = job.waitForCompletion(true);
       int status=  issucess ? 0 : 1;
       System.exit(status);
   }

}

效果图

在这里插入图片描述

缘不易

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
打赏
0
评论
MapReduce统计单词个数

测试数据q w e t y u s hy d g h s g s e w f qw er a fs ds asda ds sd df gf h g sds we sds sa fd sd sd as df f a wwe ew d fg s gf d h x f e f d sd r sd t dssd df f g x w t yu d c s t d d eq w e t y u s...
复制链接

扫一扫