MapReduce英语单词频次统计

1.前提准备

1.1 启动hadoop(集群)

  1. 启动HDFS
start-dfs.sh
  1. 启动YARN
start-yarn.sh
  1. 历史服务器
mapred --daemon start historyserver

2.创建Maven工程

2.1 使用idea创建Maven工程

请勿使用idea社区版

在这里插入图片描述
在这里插入图片描述

2.2 导入Hadoop的maven依赖

自行官网搜索依赖:mvnrepository
或直接将下文的xml的“dependency”中的“version”改为自己的hadoop版本

此处以hadoop3.3.4为例

  1. 在pom.xml中新增hadoop的依赖
<dependencies>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>3.3.4</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>3.3.4</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapreduce-client-core</artifactId>
            <version>3.3.4</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>3.3.4</version>
            <scope>test</scope>
        </dependency>
    </dependencies>
  1. 加载maven变更,此操作会更新本地maven仓库,需自动下载,稍等片刻

在这里插入图片描述

3.MapReduce程序

3.1 此处直接将课堂给的参考链接代码复制

参考链接

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

public class MyMap extends Mapper<LongWritable, Text,Text,LongWritable> {
    @Override
    protected void map(LongWritable key,Text value,Context context) throws IOException, InterruptedException {
        //1 get values string
        String valueString = value.toString();
        //2 split string
        String wArr[] = valueString.split(" ");
        //3 for iterator
        for(int i = 0;i < wArr.length;i++){
            //map out key/value
            context.write(new Text(wArr[i]), new LongWritable(1));
        }
    }
}

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;
import java.util.Iterator;

public class MyReduce extends Reducer<Text, LongWritable,Text,LongWritable> {
    @Override
    protected void reduce(Text key,Iterable<LongWritable> valueIn,Context context) throws IOException, InterruptedException {
        Iterator<LongWritable> it = valueIn.iterator();
        long sum = 0;
        //iterator count arr
        while(it.hasNext()){
            sum += it.next().get();
        }
        context.write(key,new LongWritable(sum));
    }
}
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;

public class TestJob {
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        Configuration conf = new Configuration();
        //1 get a job
        Job job = Job.getInstance(conf);
        //2 set jar main class
        job.setJarByClass(TestJob.class);
        //3 set map class and reducer class
        job.setMapperClass(MyMap.class);
        job.setReducerClass(MyReduce.class);
        //4 set map reduce output type
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(LongWritable.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(LongWritable.class);
        //5 set key/value output file format and input/output path
        FileInputFormat.setInputPaths(job,new Path("/input/word.txt"));
        FileOutputFormat.setOutputPath(job,new Path("/output"));
        //6 commit job
        job.waitForCompletion(true);
    }
}

4.打包

4.1打包为jar包

  1. 依次点击clean和package在这里插入图片描述
  2. target文件夹下会生成jar文件
    在这里插入图片描述

4.2(可选)jar包下载

  1. 提供本人生成的jar包,可下载

wordcount的jar包下载链接

5.运行jar

5.1上传jar文件和input/word.txt至liunx

  1. 上传jar文件和input文件夹至liunx的/data/temp
    在这里插入图片描述

  2. 上传input至hdfs

hdfs dfs -put /data/temp/input /input

5.2运行jar

  1. 验证hdfs文件系统中没有output文件夹
hdfs dfs -ls /

在这里插入图片描述
2. 如有/output文件夹,删除

hdfs dfs -rm -r  /output
  1. 再次验证

在这里插入图片描述
4. 运行jar

hadoop jar /data/temp/hadoop03-1.0.jar TestJob
  1. 查看结果
hdfs dfs -cat /output/part-r-00000
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
好的,这是一个比较经典的问题,实现MapReduce的WordCount单词词汇统计可以分为以下几个步骤: 1. Map阶段:将输入的文本文件按照空格或其他分隔符进行分割,每个分割出来的单词作为key,值为1,即将每个单词映射为(key, value)键值对。 2. Shuffle阶段:将Map阶段输出的键值对按照key进行排序,并将相同key的value合并为一个数组。 3. Reduce阶段:对每个key的values数组进行遍历,将其中的值累加即可得到该单词的词频。 以下是MapReduce的WordCount单词词汇统计的Java代码示例: ```java public class WordCount { public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } } public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } context.write(key, new IntWritable(sum)); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "wordcount"); job.setJarByClass(WordCount.class); job.setMapperClass(Map.class); job.setCombinerClass(Reduce.class); job.setReducerClass(Reduce.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } } ``` 以上就是MapReduce的WordCount单词词汇统计实现过程和Java代码示例。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值