调用MapReduce对文件中各个单词出现次数进行统计

最新推荐文章于 2022-03-31 11:07:24 发布

VIP文章 weixin_43368981

最新推荐文章于 2022-03-31 11:07:24 发布

阅读量199

点赞数

文章标签： hadoop 大数据

本文链接：https://blog.csdn.net/weixin_43368981/article/details/111567770

版权

调用MapReduce对文件中各个单词出现次数进行统计

实验配置：系统:Ubuntu Kylin | 环境：Hadoop | 软件：Eclipse

一.调用MapReduce执行WordCount对单词进行计数

1.准备

准备一个不少于10000万单词的文本文件，内容不限；
将实验的文本文件上传到HDFS中（确保Hadoop为开启状态）；

./bin/hdfs dfs -put /usr/local/hadoop/demo.txt  input

调用ls命令查看文件上传情况

./bin/hdfs dfs –ls input

上传成功后可以在文件中看到实验文件。

接着，我们利用Ubuntu左侧边栏自带的软件中心安装软件，在Ubuntu左侧边栏打开软件中心，在搜索框输入Eclipse找到对应文件，下载Eclipse。

在这里插入图片描述
下载后执行如下命令

sudo tar -zxf ~/下载/eclipse-java-mars-1-linux-gtk*.tar.gz -C /usr/lib

将 Eclipse 安装至 /usr/lib 目录中。

安装完Eclipse，我们还需要安装 hadoop-eclipse-plugin，用于在 Eclipse 上编译和运行 MapReduce 程序，下载后，将 release 中的 hadoop-eclipse-kepler-plugin-2.6.0.jar 复制到 Eclipse 安装目录的 plugins 文件夹中，运行 eclipse -clean 重启 Eclipse。

unzip -qo ~/下载/hadoop2x-eclipse-plugin-master.zip -d ~/下载    
sudo cp ~/下载/hadoop2x-eclipse-plugin-master/release/hadoop-eclipse-plugin-2.6.0.jar /usr/lib/eclipse/plugins/   
/usr/lib/eclipse/eclipse -clean