在hadoop streaming 运行c++

最新推荐文章于 2020-06-15 18:33:11 发布

吾名

最新推荐文章于 2020-06-15 18:33:11 发布

阅读量1.5k

点赞数

分类专栏：云计算/大数据

本文链接：https://blog.csdn.net/qq_29985391/article/details/80069754

版权

云计算/大数据专栏收录该内容

8 篇文章 0 订阅

订阅专栏

参考文章：

https://blog.csdn.net/huangmeng1214/article/details/11731531

以下是全文内容

hadoop streaming是一个hadoop的工具，可以使用户使用其他语言编写mapreduce程序，用户只需要提供Mapper和Reducer，就可以执行Map/Reduce作业

相关资料请看hadoop streaming官方文档

1、下面以实现WordCount为例，使用C++编写Mapper和Reducer

Mapper.cpp代码如下：

[cpp]view plain copy
#include <iostream>  
#include <string>  
using namespace std;  
  
int main()  
{  
    string key;  
    const int value = 1;  
      
    while (cin >> key)  
    {  
        cout << key << " " << value << endl;  
    }  
      
    return 0;  
}  

Reducer.cpp代码如下：

[cpp]view plain copy
#include <iostream>  
#include <string>  
#include <map>  
using namespace std;  
  
int main()  
{  
    string key;  
    int value;  
    map<string, int> result;  
    map<string, int>::iterator it;  
      
    while(cin >> key)  
    {  
        cin >> value;  
        it = result.find(key);  
        if (it != result.end())  
        {  
            (it->second)++;  
        }  
        else  
        {  
            result[key] = value;  
        }  
    }  
      
    for (it = result.begin(); it != result.end(); ++it)  
    {  
        cout << it->first << " " << it->second << endl;  
    }  
      
    return 0;  
}  

2、编译产生可执行文件Mapper和Reducer，命令如下：

[html]view plain copy
#g++ Mapper.cpp -o Mapper  
#g++ Reducer.cpp -o Reducer  

3、编辑一个脚本runJob.sh如下：

[html]view plain copy
$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/contrib/streaming/hadoop-streaming-1.1.2.jar \  
-mapper Mapper \  
-reducer Reducer \  
-input /test/input/a.txt \  
-output /test/output/test3 \  
-file Mapper \  
-file Reducer