wc_mapper.php
#!/usr/bin/php
<?php
error_reporting(0);
$in = fopen("php://stdin", "r");
$results = array();
while ( $line = fgets($in, 4096) )
{
$words = preg_split('/\W/', $line, 0, PREG_SPLIT_NO_EMPTY);
foreach ($words as $word)
$results[$word] += 1;
}
fclose($in);
foreach ($results as $key => $value)
print "$key\t$value\n";
?>
wc_reducer.php
#!/usr/bin/php
<?php
error_reporting(0);
$in = fopen("php://stdin", "r");
$results = array();
while ( $line = fgets($in, 4096) )
{
list($key, $value) = preg_split("/\t/", trim($line), 2);
$results[$key] += $value;
}
fclose($in);
ksort($results);
foreach ($results as $key => $value)
print "$key\t$value\n";
?>
运行Map/Reduce服务
cd /usr/local/hadoop
bin/hadoop jar contrib/streaming/hadoop-streaming-1.0.0.jar -file mapper/wc_mapper.php -mapper mapper/wc_mapper.php -file reducer/wc_reducer.php -reducer reducer/wc_reducer.php -input /data/input -output /data/output
注意,如果使用的是Hadoop集群,需要使用file参数指明文件路径,以便拷贝到子节点去。
或者使用绝对路径:
bin/hadoop jar contrib/streaming/hadoop-streaming-1.0.0.jar -file /usr/local/hadoop/mapper/wc_mapper.php -mapper /usr/local/hadoop/mapper/wc_mapper.php -file /usr/local/hadoop/reducer/wc_reducer.php -reducer /usr/local/hadoop/reducer/wc_reducer.php -input /data/input -output /data/output
更多资料:http://www.koopman.me/2009/04/hadoop-streaming-with-php/
alias stream='/usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-streaming-1.0.0.jar'
stream \
-mapper /usr/local/hadoop/mapper/wc_mapper.php \
-reducer /usr/local/hadoop/reducer/wc_reducer.php \
-file /usr/local/hadoop/mapper/wc_mapper.php \
-file /usr/local/hadoop/reducer/wc_reducer.php \
-input /data/input \
-output /data/output/wc
如果使用其他语言可参考这里:( C / Python )
http://www.hadoopor.com/thread-256-1-1.html
参考资料:
http://rdc.taobao.com/team/top/tag/hadoop-php-stdin/
http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/