借助hadoop streaming，使用C++编写MapReduce程序

最新推荐文章于 2020-11-28 07:06:10 发布

黄晓萌

最新推荐文章于 2020-11-28 07:06:10 发布

阅读量1.8k

点赞数 1

分类专栏： hadoop

本文链接：https://blog.csdn.net/huangmeng1214/article/details/11731531

版权

hadoop 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

hadoop streaming是一个hadoop的工具，可以使用户使用其他语言编写mapreduce程序，用户只需要提供Mapper和Reducer，就可以执行Map/Reduce作业

相关资料请看hadoop streaming官方文档

1、下面以实现WordCount为例，使用C++编写Mapper和Reducer

Mapper.cpp代码如下：

#include <iostream>
#include <string>
using namespace std;

int main()
{
	string key;
	const int value = 1;
	
	while (cin >> key)
	{
		cout << key << " " << value << endl;
	}
	
	return 0;
}

Reducer.cpp代码如下：

#include <iostream>
#include <string>
#include <map>
using namespace std;

int main()
{
	string key;
	int value;
	map<string, int> result;
	map<string, int>::iterator it;
	
	while(cin >> key)
	{
		cin >> value;
		it = result.find(key);
		if (it != result.end())
		{
			(it->second)++;
		}
		else
		{
			result[key] = value;
		}
	}
	
	for (it = result.begin(); it != result.end(); ++it)
	{
		cout << it->first << " " << it->second << endl;
	}
	
	return 0;
}

2、编译产生可执行文件Mapper和Reducer，命令如下：

#g++ Mapper.cpp -o Mapper
#g++ Reducer.cpp -o Reducer

3、编辑一个脚本runJob.sh如下：

$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/contrib/streaming/hadoop-streaming-1.1.2.jar \
-mapper Mapper \
-reducer Reducer \
-input /test/input/a.txt \
-output /test/output/test3 \
-file Mapper \
-file Reducer

-input是Job输入文件在hdfs中的位置

-output是Job产生结果存放在hdfs的目录

-file指定Mapper和Reducer的位置，如果不指定file，使用-mapper和-reducer可能会错

此外还可以指定使用-jobconf指定mapreduce job的一些参数，比如map个数和reduce个数，可以参考hadoop streaming官方文档

4、执行命令#sh runJob.sh可以看到MapReduce Job完成正常

可以看到结果与使用hadoop-example-1.1.2.jar wordcount效果是一样的

黄晓萌

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
借助hadoop streaming，使用C++编写MapReduce程序

#g++ Mapper.cpp -o Mapper#g++ Reducer.cpp -o Reducerhadoop streaming是一个hadoop的工具，可以使用户使用其他语言编写mapreduce程序，用户只需要提供Mapper和Reducer，就可以执行Map/Reduce作业相关资料请看hadoop streaming官方文档1、下面以实现WordCount为例
复制链接

扫一扫