hadoop 之 mapreduce job 案例练习

最新推荐文章于 2023-04-07 22:45:00 发布

仙人掌仙人

最新推荐文章于 2023-04-07 22:45:00 发布

阅读量500

点赞数

分类专栏： hadoop 零基础 Bigdata learning 大数据文章标签： mapreduce

本文链接：https://blog.csdn.net/Sylvia_D507/article/details/82944490

版权

Bigdata learning 同时被 3 个专栏收录

30 篇文章 0 订阅

订阅专栏

零基础

28 篇文章 0 订阅

订阅专栏

大数据

28 篇文章 0 订阅

订阅专栏

mapreduce job 案例

官网要求：
1、Make the HDFS directories required to execute MapReduce jobs:
$ bin/hdfs dfs -mkdir /user
$ bin/hdfs dfs -mkdir /user/< username>
2、Copy the input files into the distributed filesystem:
$ bin/hdfs dfs -put etc/hadoop input
3、Run some of the examples provided:
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar grep input output ‘dfs[a-z.]+’
4、Examine the output files:
Copy the output files from the distributed filesystem to the local filesystem and examine them:
$ bin/hdfs dfs -get output output
$ cat output/*
or
View the output files on the distributed filesystem:
$ bin/hdfs dfs -cat output/*
5、When you’re done, stop the daemons with:
$ sbin/stop-dfs.sh

1、制作执行MapReduce作业所需的HDFS目录：
[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ bin/hdfs dfs -mkdir -p /user/hadoop

2、将input文件复制到分布式文件系统中：
即：将linux文件夹拷贝到HDFS文件夹，默认放在/user/hadoop文件目录中
[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ bin/hdfs dfs -put etc/hadoop input
在这里插入图片描述

3、运行提供的一些示例
[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ bin/hadoop jar
\share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar grep input output ‘dfs[a-z.]+’
显示进程（需话一分钟左右）
在这里插入图片描述
生产上map run到50%左右，reduce就会开始run。
但是小数据量的话是map run 到100%后Reduce才开始run。
资源不够的时候会报错，并自动尝试提交其他资源。
这时候有可能要重新上传资源，但是输出时每次的文件夹都是不一样，因为系统不会自动覆盖原来的输出文件，所以需要给output再另命名，否则报错。
在这里插入图片描述
这种情况下案例上的数据已经提交上去了，在之前的web界面是可以看到提交的案例：http://47.75.240.243:8088/

通常，提交成功的话显示的状态是FINISHED和SUCCEEDED
可见运行这个案例是可以成功的，说明hadoop的hdfs和yarn的部署是成功的！
还没完呢~

4、检查输出文件：
[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ bin/hdfs dfs -get output output
切换到output目录：可见到 part-r-00000文件成功匹配内容，查看内容：
[hadoop@hadoop001 hadoop-2.6.0-cdh5.7.0]$ cat output/*
6 dfs.audit.logger （说明dfs开头的这个文件出现了6次）（下同）
4 dfs.class
3 dfs.server.namenode.
2 dfs.period
2 dfs.audit.log.maxfilesize
2 dfs.audit.log.maxbackupindex
1 dfsmetrics.log
1 dfsadmin
1 dfs.servers
1 dfs.replication
1 dfs.file
在这里插入图片描述