第五节Hadoop学习案例——MapReduce案例（WordCount）

最新推荐文章于 2024-08-17 21:28:57 发布

羙橘

最新推荐文章于 2024-08-17 21:28:57 发布

阅读量3.8k

点赞数 3

文章标签：学习 hadoop mapreduce 大数据

本文链接：https://blog.csdn.net/qq_53114527/article/details/127237798

版权

本文详细介绍了如何使用Hadoop MapReduce实现WordCount案例，从需求分析、编码操作到程序测试，包括创建项目、引入jar包、编写Mapper和Reducer、打包及执行程序，最后展示了查看结果的过程。

摘要由CSDN通过智能技术生成

提示：本文章内容主要围绕案例展开

2.4.2 WordCountMapper

2.4.3 WordCountReduce

提示：以下是本篇文章正文内容，下面案例可供参考

1 需求分析

1.1 需求

统计文件中各个单词出现的个数

1.2 数据准备

数据准备：hello.txt

Once when l was six years old l saw a magnificent picture in a book called True Stories from Nature, about the primeval forest.
lt was a picture of a boa constrictor in the act of swallowing an animal.
Here is a copy of the drawing: In the book it said:"Boa constrictors swallow their prey whole, without chewing it.
After that they are not able to move, and they sleep through the six months that they need for digestion."
And after some work with a coloured pencil l succeeded in making my first drawing.
My Drawing Number One.
lt iooked like this: I showed my masterpiece to the grown-ups, and asked them whether the drawing frightened them.
But they answered:"Frighten?
why should anyone be frightened by a hat?
My drawing was not a picture of a hat.
lt was a picture of a boa constrictor digesting an elephant.
But since the grown-ups were not able to understand it, I made another drawing.
l drew the inside of the boa constrictor, so that the grown-ups could see it clearly.
They always need to have things explained.

1.3 原理

数据切分：把文本中的每一行进行切分。hadoop提供类实现。
每一行切分出来形成一个split，交给map进行处理，map会把一行数据中的单词以key-value的形式进行输出，发现一个单词就设置一个key。key是单词名称，value是1。
map把最终的到数据经过整理以后交给不同的reduce进行处理，不同的reduce会收到不同的key-value数据。图中有些reduce收到以B开头的数据，有些reduce收到以C开头的数据。
reduce会把收到的数据进行合并计算，输出最终的一个结果。

2 编码操作

2.1 创建项目

创建一个mr-demo的子模块

2.2 创建包和类

2.2.1 创建包

创建org.hadoop.mr包

2.2.2 创建类

创建WordCount类

2.2 引入jar包

2.2.1 引入MR相关jar

切记一定要联网，联网，联网
检查IDEA里面pom.xml文件是否引入正常
在pom.xml里面加入以下代码

  <dependencies>
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-common</artifactId>
      <version>2.9.2</version>
    <