hadoop C++ pipes 运行实现

花了好长时间,调掉所有的bug,终于把hadoop pipes C++实现弄好了。写篇文章来祭奠下逝去的光阴,阿门。


准备:

1.按照教程:http://blog.sina.com.cn/s/blog_9c43254d0101ngug.html 安装好hadoop并可以jps到6个

2.terminal熟悉


开始:

1.先写好C++程序,这里忘记从哪copy来的一份了,反正可以用就好了,我的名字叫main.cpp

#include <algorithm>
#include <limits>
#include <string>
 
#include  "stdint.h"  // <--- to prevent uint64_t errors!
 
#include "hadoop/Pipes.hh"
#include "hadoop/TemplateFactory.hh"
#include "hadoop/StringUtils.hh"
 
using namespace std;

class WordCountMapper : public HadoopPipes::Mapper {
public:
  // constructor: does nothing
  WordCountMapper( HadoopPipes::TaskContext& context ) {
  }

  // map function: receives a line, outputs (word,"1")
  // to reducer.
  void map( HadoopPipes::MapContext& context ) {
    //--- get line of text ---
    string line = context.getInputValue();

    //--- split it into words ---
    vector< string > words =
      HadoopUtils::splitString( line, " " );

    //--- emit each word tuple (word, "1" ) ---
    for ( unsigned int i=0; i < words.size(); i++ ) {
      context.emit( words[i], HadoopUtils::toString( 1 ) );
    }
  }
};
 
class WordCountReducer : public HadoopPipes::Reducer {
public:
  // constructor: does nothing
  WordCountReducer(HadoopPipes::TaskContext& context) {
  }

  // reduce function
  void reduce( HadoopPipes::ReduceContext& context ) {
    int count = 0;

    //--- get all tuples with the same key, and count their numbers ---
    while ( context.nextValue() ) {
      count += HadoopUtils::toInt( context.getInputValue() );
    }

    //--- emit (word, count) ---
    context.emit(context.getInputKey(), HadoopUtils::toString( count ));
  }
};
 
int main(int argc, char *argv[]) {
  return HadoopPipes::runTask(HadoopPipes::TemplateFactory<
                              WordCountMapper,
                              WordCountReducer >() );
}

2.在和main.cpp同目录底下新建makefile文件,内容如下:

CC = g++
HADOOP_INSTALL = /usr/local/hadoop
PLATFORM = Linux-i386-32
CPPFLAGS = -m32 -I$(HADOOP_INSTALL)/c++/$(PLATFORM)/include -I$(HADOOP_INSTALL)/src/c++/install/include -L$(HADOOP_INSTALL)/src/c++/install/lib -lhadooputils -lhadooppipes -lcrypto -lssl -lpthread

wordcount: main.cpp
	$(CC) $(CPPFLAGS) $< -Wall -L$(HADOOP_INSTALL)/c++/$(PLATFORM)/lib -lhadooppipes \
	-lhadooputils -lcrypto -lpthread -g -O2 -o $@

注意我的hadoop_install目录可能和你的不一样,还有platform也可能不一样,自己去搜怎么确定这两个的值


然后make产生可执行文件wordcount


hello.txt


hello world



3.配置hadoop远端文件:

 A.在/etc/environment中加入PATH路径 /usr/local/hadoop/bin ,这个是我的安装目录底下bin文件夹,为的是方便后面操作

 B.  运行./start-all.sh开始hadoop远端操作

 C. 运行如下脚本

   hadoop dfs -mkdir /home     

  hadoop dfs -mkdir /bin

  意思是在hadoop远端根目录地下产生两个文件夹


  然后将可执行文件wordcount和源文件hello.txt上传到远端,命令

  hadoop dfs -copyFromLocal ./wordcount /bin/

  hadoop dfs -copyFromLocal ./hello.txt /home

  在远端可以通过命令

  hadoop dfs -ls /home

  hadoop dfs -ls /bin

  查看已经上传的文件

 D.准备工作就绪可以跑程序了执行命令

hadoop pipes -D hadoop.pipes.java.recordreader=true -D hadoop.pipes.java.recordwriter=true -input /home/hello.txt -output /home/result -program /bin/wordcount

然后就可以到远端的/home/result中ls命令之后cat或者复制回本地了



评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值