运行 C++ 版本的 mapreduce (Hadoop streaming)


mapper.cpp文件:

#include<string>
#include<iostream>

using namespace std;

int main()
{
   string line;

   while(cin>>line)
   {
	   cout<<line<<"\t"<<1<<endl;
   }
   return 0;
}

reduce.cpp文件:

#include <map>
#include <string>
#include <iostream>

using namespace std;

int main()
{
	string key;
	string value;
	map<string,int>word_count;
    map<string,int>::iterator it;
    while(cin>>key)
    {
    	cin>>value;
    	it=word_count.find(key);
    	if(it!=word_count.end())
    	{
    		++(it->second);
    	}
    	else
    	{
    		word_count.insert(make_pair(key,1));
    	}
    }

    for(it=word_count.begin();it!=word_count.end();++it)
    {
    	cout<<it->first<<"\t"<<it->second<<endl;
    }

    return 0;
}

file_1

hello hadoop helloworld

file_2

fu zu xian hadoop

脚本runjob.sh   :

#!/bin/bash
/usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.7.3.jar \
-file map -file reduce \
-input /data/demon/wordcount_c  -output /data/demon/wordcount_c/output \
-mapper /home/hadoop/project/wordcount_c/wordcount_c++/map  \
-reducer /home/hadoop/project/wordcount_c/wordcount_c++/reduce
chmod +x runjob.s

运行成功,结果如下:

hadoop@Master:~$ hadoop fs  -ls /data/demon/wordcount_c/output
Found 2 items
-rw-r--r--   2 hadoop supergroup          0 2018-03-22 16:47 /data/demon/wordcount_c/output/_SUCCESS
-rw-r--r--   2 hadoop supergroup         47 2018-03-22 16:47 /data/demon/wordcount_c/output/part-00000
hadoop@Master:~$ hadoop fs  -cat /data/demon/wordcount_c/output/part-00000
fu	1
hadoop	2
hello	1
helloworld	1
xian	1
zu	1


链接:

hadoop streaming参数配置


链接:

Hadoop Tutorial 2.2 -- Running C++ Programs on Hadoop



这个过程中遇见的错误:

运行程序失败:

hadoop@Master:~/project/wordcount_c/wordcount_c++$ ./runjob.sh 
18/03/22 16:12:58 ERROR streaming.StreamJob: Unrecognized option: -reduce
Usage: $HADOOP_PREFIX/bin/hadoop jar hadoop-streaming.jar [options]
Options:
  -input          <path> DFS input file(s) for the Map step.
  -output         <path> DFS output directory for the Reduce step.
  -mapper         <cmd|JavaClassName> Optional. Command to be run as mapper.
  -combiner       <cmd|JavaClassName> Optional. Command to be run as combiner.
  -reducer        <cmd|JavaClassName> Optional. Command to be run as reducer.
  -file           <file> Optional. File/dir to be shipped in the Job jar file.
                  Deprecated. Use generic option "-files" instead.
  -inputformat    <TextInputFormat(default)|SequenceFileAsTextInputFormat|JavaClassName>
                  Optional. The input format class.
  -outputformat   <TextOutputFormat(default)|JavaClassName>
                  Optional. The output format class.
  -partitioner    <JavaClassName>  Optional. The partitioner class.
  -numReduceTasks <num> Optional. Number of reduce tasks.
  -inputreader    <spec> Optional. Input recordreader spec.
  -cmdenv         <n>=<v> Optional. Pass env.var to streaming commands.
  -mapdebug       <cmd> Optional. To run this script when a map task fails.
  -reducedebug    <cmd> Optional. To run this script when a reduce task fails.
  -io             <identifier> Optional. Format to use for input to and output
                  from mapper/reducer commands
  -lazyOutput     Optional. Lazily create Output.
  -background     Optional. Submit the job and don't wait till it completes.
  -verbose        Optional. Print verbose output.
  -info           Optional. Print detailed usage.
  -help           Optional. Print help message.

Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenode
-jt <local|resourcemanager:port>    specify a ResourceManager
-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]


For more details about these options:
Use $HADOOP_PREFIX/bin/hadoop jar hadoop-streaming.jar -info

Try -help for more information
Streaming Command Failed!
hadoop@Master:~/project/wordcount_c/wordcount_c++$ 

这错误醉了。。。。

改了之后还是出错:

hadoop@Master:~/project/wordcount_c/wordcount_c++$ ./runjob.sh 
18/03/22 16:16:19 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.
packageJobJar: [map, reduce, /tmp/hadoop-unjar2530400161100689843/] [] /tmp/streamjob7826754138722798286.jar tmpDir=null
18/03/22 16:16:20 INFO client.RMProxy: Connecting to ResourceManager at Master/192.168.1.110:8032
18/03/22 16:16:20 INFO client.RMProxy: Connecting to ResourceManager at Master/192.168.1.110:8032
18/03/22 16:16:20 ERROR streaming.StreamJob: Error Launching job : Output directory hdfs://Master:9000/data/wordcout already exists
Streaming Command Failed!
hadoop@Master:~/project/wordcount_c/wordcount_c++$ 
删除了以前的目录,创建新的目录,并上传文件。当然脚本也要改
hadoop@Master:~$ hadoop fs  -mkdir /data/demon/wordcount_c
hadoop@Master:~$ hadoop fs  -put  /home/hadoop/project/wordcount_c/wordcount_c++/file_2  /data/demon/wordcount_c
hadoop@Master:~$ hadoop fs  -put  /home/hadoop/project/wordcount_c/wordcount_c++/file_1  /data/demon/wordcount_c


多此一举的做法 :)

hadoop@Master:~$ hadoop fs  -mkdir /data/demon/wordcount_c/output
这个目录不应该有,不必建立。。。。
hadoop@Master:~/project/wordcount_c/wordcount_c++$ ./runjob.sh 
18/03/22 16:42:43 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.
packageJobJar: [map, reduce, /tmp/hadoop-unjar217718645459140512/] [] /tmp/streamjob6276961858876481540.jar tmpDir=null
18/03/22 16:42:44 INFO client.RMProxy: Connecting to ResourceManager at Master/192.168.1.110:8032
18/03/22 16:42:44 INFO client.RMProxy: Connecting to ResourceManager at Master/192.168.1.110:8032
18/03/22 16:42:44 ERROR streaming.StreamJob: Error Launching job : Output directory hdfs://Master:9000/data/demon/wordcount_c/output already exists
Streaming Command Failed!
hadoop@Master:~/project/wordcount_c/wordcount_c++$ 
把/data/demon/wordcount_c/output  目录删除。。。



  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值