MapReduce功能测试

最新推荐文章于 2022-03-11 20:31:32 发布

cqbh2011

最新推荐文章于 2022-03-11 20:31:32 发布

阅读量141

点赞数

文章标签：大数据数据库

2、MapReduce功能测试

MapReduce原理大致明白了，可现在仍不知从何处入手咋整呐，还好还好，hadoop自带了几个jar包可用于测试：

[grid@hdnode1 hadoop-0.20.2]$ ll *.jar

-rw-rw-r-- 1 grid grid 6839 Feb 19 2010 hadoop-0.20.2-ant.jar

-rw-rw-r-- 1 grid grid 2689741 Feb 19 2010 hadoop-0.20.2-core.jar

-rw-rw-r-- 1 grid grid 142466 Feb 19 2010 hadoop-0.20.2-examples.jar

-rw-rw-r-- 1 grid grid 1563859 Feb 19 2010 hadoop-0.20.2-test.jar

-rw-rw-r-- 1 grid grid 69940 Feb 19 2010 hadoop-0.20.2-tools.jar

这5个jar包功能各有不同，我们下面以hadoop-0.20.2-examples.jar为例(这招是跟tigerfish老师学的)。

有了现成的jar文件，那怎么执行呢，通过hadoop命令附加jar选项即可，例如：

[grid@hdnode1 ~]$ hadoop jar /usr/local/hadoop-0.20.2/hadoop-0.20.2-examples.jar

An example program must be given as the first argument.

Valid program names are:

aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.

aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.

dbcount: An example job that count the pageview counts from a database.

grep: A map/reduce program that counts the matches of a regex in the input.

join: A job that effects a join over sorted, equally partitioned datasets

multifilewc: A job that counts words from several files.

pentomino: A map/reduce tile laying program to find solutions to pentomino problems.

pi: A map/reduce program that estimates Pi using monte-carlo method.

randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.

randomwriter: A map/reduce program that writes 10GB of random data per node.

secondarysort: An example defining a secondary sort to the reduce.

sleep: A job that sleeps at each map and reduce task.

sort: A map/reduce program that sorts the data written by the random writer.

sudoku: A sudoku solver.

teragen: Generate data for the terasort

terasort: Run the terasort

teravalidate: Checking results of terasort

wordcount: A map/reduce program that counts the words in the input files.

通过上面输出的信息可以看到，hadoop-0.20.2-example.jar支持的参数不少，我们先来试试最后一个参数，wordcount，计算指定文件中的单词数据。文件是现成的，就用咱们刚刚上传到HDFS/jss文件夹下的两个文件。

执行命令如下，将结果输出到jsscount文件夹中：

[grid@hdnode1 ~]$ hadoop jar /usr/local/hadoop-0.20.2/hadoop-0.20.2-examples.jar wordcount jss jsscount

13/02/17 20:00:48 INFO input.FileInputFormat: Total input paths to process : 2

13/02/17 20:00:48 INFO mapred.JobClient: Running job: job_201302041636_0001

13/02/17 20:00:49 INFO mapred.JobClient: map 0% reduce 0%

13/02/17 20:00:58 INFO mapred.JobClient: map 50% reduce 0%

13/02/17 20:01:01 INFO mapred.JobClient: map 100% reduce 0%

13/02/17 20:01:10 INFO mapred.JobClient: map 100% reduce 100%

13/02/17 20:01:12 INFO mapred.JobClient: Job complete: job_201302041636_0001

13/02/17 20:01:13 INFO mapred.JobClient: Counters: 17

13/02/17 20:01:13 INFO mapred.JobClient: Job Counters

13/02/17 20:01:13 INFO mapred.JobClient: Launched reduce tasks=1

13/02/17 20:01:13 INFO mapred.JobClient: Launched map tasks=2

13/02/17 20:01:13 INFO mapred.JobClient: Data-local map tasks=2

13/02/17 20:01:13 INFO mapred.JobClient: FileSystemCounters

13/02/17 20:01:13 INFO mapred.JobClient: FILE_BYTES_READ=84

13/02/17 20:01:13 INFO mapred.JobClient: HDFS_BYTES_READ=42

13/02/17 20:01:13 INFO mapred.JobClient: FILE_BYTES_WRITTEN=238

13/02/17 20:01:13 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=35

13/02/17 20:01:13 INFO mapred.JobClient: Map-Reduce Framework

13/02/17 20:01:13 INFO mapred.JobClient: Reduce input groups=4

13/02/17 20:01:13 INFO mapred.JobClient: Combine output records=6

13/02/17 20:01:13 INFO mapred.JobClient: Map input records=2

13/02/17 20:01:13 INFO mapred.JobClient: Reduce shuffle bytes=90

13/02/17 20:01:13 INFO mapred.JobClient: Reduce output records=4

13/02/17 20:01:13 INFO mapred.JobClient: Spilled Records=12

13/02/17 20:01:13 INFO mapred.JobClient: Map output bytes=66

13/02/17 20:01:13 INFO mapred.JobClient: Combine input records=6

13/02/17 20:01:13 INFO mapred.JobClient: Map output records=6

13/02/17 20:01:13 INFO mapred.JobClient: Reduce input records=6

执行信息暂略过不表，先来看结果：

[grid@hdnode1 ~]$ hadoop dfs -ls

Found 2 items

drwxr-xr-x - grid supergroup 0 2013-02-17 16:58 /user/grid/jss

drwxr-xr-x - grid supergroup 0 2013-02-17 20:01 /user/grid/jsscount

果然出现jsscount目录一枚，查看该目录下都有什么内容：

[grid@hdnode1 ~]$ hadoop dfs -ls jsscount

Found 2 items

drwxr-xr-x - grid supergroup 0 2013-02-17 20:00 /user/grid/jsscount/_logs

-rw-r--r-- 3 grid supergroup 35 2013-02-17 20:01 /user/grid/jsscount/part-r-00000

一个目录和一个文件，目录不管它，咱们先看文件：

[grid@hdnode1 ~]$ hadoop dfs -cat jsscount/part-r-00000

Hello 2

Junsansi 2

says: 1

world 1

目测这个结果是正确的。

来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/7607759/viewspace-757181/，如需转载，请注明出处，否则将追究法律责任。

转载于:http://blog.itpub.net/7607759/viewspace-757181/

cqbh2011

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
MapReduce功能测试

2、MapReduce功能测试 MapReduce原理大致明白了，可现在仍不知从何处入手咋整呐，还好还好，hadoop自带了几个jar包可用于测试： [grid@hdnode1hadoop-0.20.2]$ll*.jar ...
复制链接

扫一扫