hadoop基准_Hadoop上的简单排序基准

最新推荐文章于 2024-11-02 17:06:08 发布

cuma2369

最新推荐文章于 2024-11-02 17:06:08 发布

阅读量212

点赞数

文章标签： hadoop 大数据 mapreduce spark java

原文链接：https://www.systutorials.com/simple-sort-benchmark-on-hadoop/

版权

hadoop基准

After installing Hadoop, we usually run some benchmark programs to test whether the system works well. In the post of the Hadoop install tutorial, we show a very simple to grep strings from a simple sets of files. In this post, we introduce the Sort for testing and benchmarking Hadoop. The Sort program is also included in the Hadoop distribution package, and the package also includes a input data generator which generate 10 GB * (number of slave nodes) input data to sort. This program processes larger a datasets, which gives some strength to Hadoop including the execution engine and HDFS.

安装Hadoop之后，我们通常会运行一些基准程序来测试系统是否运行良好。在Hadoop安装教程的文章中，我们显示了来自一组简单文件的非常简单的grep字符串。在本文中，我们介绍了用于对Hadoop进行测试和基准测试的Sort。 Hadoop分发软件包中还包含Sort程序，该软件包还包括一个输入数据生成器，该生成器生成10 GB *（从属节点数）输入数据进行排序。该程序处理更大的数据集，这为Hadoop提供了一些优势，包括执行引擎和HDFS。

The Sort example program simply uses the MapReduce framework to sort the input directory into the output directory. The mapper is the predefined IdentityMapper and the reducer is the predefined IdentityReducer, both of which just pass their inputs directly to the output. The inputs and outputs must be Sequence files where the keys and values are BytesWritable.

Sort示例程序仅使用MapReduce框架将输入目录排序到输出目录中。映射器是预定义的IdentityMapper，而reducer是预定义的IdentityReducer，两者都只是将其输入直接传递到输出。输入和输出必须是Sequence文件，其中的键和值是BytesWritable。

The RandomWriter example program writes 10 GB (by default) of random data per host to HDFS using MapReduce. Each map takes a single file name as input and writes random BytesWritable keys and values to the DFS sequence file. The maps do not emit any output and the reduce phase is not used.

RandomWriter示例程序使用MapReduce将每个主机的10 GB（默认）随机数据写入HDFS。每个映射都使用一个文件名作为输入，并将随机的BytesWritable键和值写入DFS序列文件。映射不发出任何输出，并且不使用reduce阶段。

For a quick test of the Sort benchmark, just execute these two commands after setting up and starting the Hadoop] (here we are in the Hadoop directory. If run the commands outside the Hadoop directory, simply use the full/relative path for the jar file):

为了快速测试Sort基准，只需在设置并启动Hadoop之后执行这两个命令（此处位于Hadoop目录中。如果在Hadoop目录之外运行命令，只需使用jar的完整/相对路径文件）：

hadoop jar hadoop-*-examples.jar randomwriter rand
hadoop jar hadoop-*-examples.jar randomwriter rand
hadoop jar hadoop-*-examples.jar sort rand rand-sort
hadoop jar hadoop-*-examples.jar排序rand rand-sort

The first command generates the random data into rand and the second commands sorts the generated data in rand and the result is put into rand-sort.

第一个命令将随机数据生成为rand，第二个命令将生成的数据按rand排序，并将结果放入rand-sort。

For more details and more options of the Sort and RandomWriter example programs, please refer to the Hadoop Wiki: Sort and RandomWriter.

有关Sort和RandomWriter示例程序的更多详细信息和更多选项，请参考Hadoop Wiki： Sort和RandomWriter 。