Hadoop一般使用Java来写MapReduce,但是也支持其他语言和脚本,类似于管道的概念。即将中间结果通过管道输出给某一可执行文件或脚本,让其充当Map或者Reduce。
以Perl实现MapReduce的WordCount为例,代码如下(可以对比Java的结果):
Mapper.pl
while(<>){
chomp;
@arr = split /\s/;
for $word(@arr){
print "$word\t1\n";
}
}
Reducer.pl
my $last_key = "";
my $key = "";
$n = 0;
$firstLine = 1;
while(<>){
chomp;
@arr = split(/\t/,$_);
$key = $arr[0];
$value = $arr[1];
if($firstLine == 1){
$last_key = $arr[0];
$firstLine = 0;
}
if($key ne $last_key){
print "$last_key\t$n\n";
$last_key = $key;
$n = 1;
}
else{
$n++;
}
}
print "$last_key\t$n\n";
执行:
E:\Code\hadoop-2.4.1\bin\hadoop.cmd jar "E:\Code\hadoop-2.4.1\share\hadoop\tools\lib\hadoop-streaming-2.4.1.jar" -input test.txt -output output -mapper "perl mapper.pl" -reducer "perl reducer.pl"
结果: