Hadoop实例RandomWriter

最新推荐文章于 2020-01-16 10:34:09 发布

大数据hadoop

最新推荐文章于 2020-01-16 10:34:09 发布

阅读量637

点赞数

分类专栏：大数据学习互联网资讯人工智能文章标签：大数据程序员编程语言

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/YUSDook/article/details/90215757

版权

RandomWriter是一个使用MapReduce将数据随机写入HDFS的示例。每个map处理一个文件名，生成随机的BytesWritable键值对。配置参数如`test.randomwriter.maps_per_host`和`test.randomwrite.bytes_per_map`分别控制map任务的数量和每个map输出的数据量。代码中展示了如何设置这些参数以及如何实现map任务。实验结果显示，即使改变`test.randomwrite.bytes_per_map`的值，输出文件大小仍保持1G，这可能是因为其他配置限制了文件大小。

摘要由CSDN通过智能技术生成

RandomWriter（随机写）例子利用 Map/Reduce把数据随机的写到dfs中。每个map输入单个文件名，然后随机写BytesWritable的键和值到DFS顺序文件。map没有产生任何输出，所以reduce没有执行。产生的数据是可以配置的。配置变量如下

名字	默认值	描述
test.randomwriter.maps_per_host	10	Number of maps/host
test.randomwrite.bytes_per_map	1073741824	Number of bytes written/map
test.randomwrite.min_key	10	minimum size of the key in bytes
test.randomwrite.max_key	1000	maximum size of the key in bytes
test.randomwrite.min_value	0	minimum size of the value
test.randomwrite.max_value	20000	maximum size of the value

test.randomwriter.maps_per_host表示每个slave节点上运行map的次数。默认情况下，即只有一个数据节点，那么就有10个map，每个map的数据量为1G，因此要将10G数据写入到hdfs中。不过我配置的试验环境中只有2个slave节点，因此有两个map。

test.randomwrite.bytes_per_map我原本以为是随机写输出的测试文件的大小，默认为1G=1*1024*1024*1024，但是我将这个数据改成1*1024*1024以后，输出的测试文件还是1G，这让我很不解。（？）

代码实例

其中test.randomwrite.bytes_per_map=1*1024*1024，test.randomwriter.maps_per_host=1。

/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.examples;
import java.io.IOException;
import java.util.Date;
import java.util.Random;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.BytesWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.mapred.ClusterStatus;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.FileSplit;
import org.apache.hadoop.mapred.InputFormat;
import org.apache.hadoop.mapred.InputSplit;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.RecordReader;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.SequenceFileOutputFormat;
import org.apache.hadoop.mapred.lib.IdentityReducer;
import org.apache.hadoop.util.GenericOptionsParser;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
/**
* This program uses map/reduce to just run a distributed job where there is
* no interaction between the tasks and each task write a large unsorted
* random binary sequence file of BytesWritable.

最低0.47元/天解锁文章

大数据hadoop

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。