通过Hadoop StreamingAPI使用perl写MapReduce

Hadoop一般使用Java来写MapReduce,但是也支持其他语言和脚本,类似于管道的概念。即将中间结果通过管道输出给某一可执行文件或脚本,让其充当Map或者Reduce

Perl实现MapReduceWordCount为例,代码如下(可以对比Java的结果):

Mapper.pl

 

while(<>){
chomp;
@arr = split /\s/;
for $word(@arr){
print "$word\t1\n";
}
}

Reducer.pl

my $last_key = "";
my $key = "";
$n = 0;
$firstLine = 1;
while(<>){
	chomp;
	@arr = split(/\t/,$_);
	$key = $arr[0];
	$value = $arr[1];
	if($firstLine == 1){
		$last_key = $arr[0];
		$firstLine = 0;
	}
	if($key ne $last_key){
		print "$last_key\t$n\n";
		$last_key = $key;
		$n = 1;
	}
	else{
		$n++;
	}
}
print "$last_key\t$n\n";

执行:

E:\Code\hadoop-2.4.1\bin\hadoop.cmd jar "E:\Code\hadoop-2.4.1\share\hadoop\tools\lib\hadoop-streaming-2.4.1.jar" -input test.txt -output output -mapper "perl mapper.pl" -reducer "perl reducer.pl"

结果:

 

 

相关推荐
©️2020 CSDN 皮肤主题: 大白 设计师:CSDN官方博客 返回首页