Hadoop运行模式
-
本地模式
By default, Hadoop is configured to run in a non-distributed mode, as a single Java process. This is useful for debugging.
默认情况下,Hadoop被配置为以非分布式模式作为单个Java进程运行。 这对于调试很有用。
-
官方Grep案例
The following example copies the unpacked conf directory to use as input and then finds and displays every match of the given regular expression. Output is written to the given output directory.
下面的示例复制解压缩的配置目录以用作输入,然后查找并显示给定正则表达式的每个匹配项。 输出被写入给定的输出目录。
$ mkdir input $ cp etc/hadoop/*.xml input $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep input output 'dfs[a-z.]+' $ cat output/*
实际操作:
- 构造输入
- 执行提供的案例
grep
- 查看输出(
output
文件夹不要手动创建,在程序执行过程中会自动创建。手动创建会出现org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/opt/module/hadoop-2.7.2/output already exists
异常。)
_SUCCESS存在代表执行成功
- 构造输入
-
官方WordCount案例(统计单词格个数)
实际操作:
-
构造输入
[root@localhost hadoop-2.7.2]# mkdir wcinput [root@localhost hadoop-2.7.2]# cd wcinput/ [root@localhost wcinput]# touch wc.input [root@localhost wcinput]# vim wc.input [root@localhost wcinput]# cat wc.input Baidu Alibaba ByteDance zhangsan lisi wangwu wangwu Bcxtm Bcxtm Bcxtm
-
执行提供的案例
wordcount
[root@localhost hadoop-2.7.2]# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount wcinput/ wcoutput
-
查看输出
[root@localhost hadoop-2.7.2]# cd wcoutput/ [root@localhost wcoutput]# ll 总用量 4 -rw-r--r-- 1 root root 65 7月 5 10:40 part-r-00000 -rw-r--r-- 1 root root 0 7月 5 10:40 _SUCCESS [root@localhost wcoutput]# cat part-r-00000 Alibaba 1 Baidu 1 Bcxtm 3 ByteDance 1 lisi 1 wangwu 2 zhangsan 1
-
-