在Hadoop中使用Streaming编写MapReduce

使用PHP编写Map / Reduce代码:

wc_mapper.php

#!/usr/bin/php
<?php
  error_reporting(0);
  $in = fopen("php://stdin", "r");
  $results = array();
  while ( $line = fgets($in, 4096) )
  {
    $words = preg_split('/\W/', $line, 0, PREG_SPLIT_NO_EMPTY);
    foreach ($words as $word)
      $results[$word] += 1;
  }
  fclose($in);
  foreach ($results as $key => $value)
    print "$key\t$value\n";
?>

wc_reducer.php

#!/usr/bin/php
<?php
  error_reporting(0);
  $in = fopen("php://stdin", "r");
  $results = array();
  while ( $line = fgets($in, 4096) )
  {
    list($key, $value) = preg_split("/\t/", trim($line), 2);
    $results[$key] += $value;
  }
  fclose($in);
  ksort($results);
  foreach ($results as $key => $value)
    print "$key\t$value\n";
?>

运行Map/Reduce服务

cd /usr/local/hadoop

bin/hadoop jar contrib/streaming/hadoop-streaming-1.0.0.jar -file mapper/wc_mapper.php -mapper mapper/wc_mapper.php -file reducer/wc_reducer.php  -reducer reducer/wc_reducer.php -input /data/input -output /data/output

注意,如果使用的是Hadoop集群,需要使用file参数指明文件路径,以便拷贝到子节点去。

或者使用绝对路径:

bin/hadoop jar contrib/streaming/hadoop-streaming-1.0.0.jar -file /usr/local/hadoop/mapper/wc_mapper.php -mapper /usr/local/hadoop/mapper/wc_mapper.php -file /usr/local/hadoop/reducer/wc_reducer.php  -reducer /usr/local/hadoop/reducer/wc_reducer.php -input /data/input -output /data/output

更多资料:http://www.koopman.me/2009/04/hadoop-streaming-with-php/

alias stream='/usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-streaming-1.0.0.jar'
stream \
-mapper /usr/local/hadoop/mapper/wc_mapper.php \
-reducer /usr/local/hadoop/reducer/wc_reducer.php \
-file /usr/local/hadoop/mapper/wc_mapper.php \
-file /usr/local/hadoop/reducer/wc_reducer.php \
-input /data/input \
-output /data/output/wc  

如果使用其他语言可参考这里:( C / Python )

http://www.hadoopor.com/thread-256-1-1.html

参考资料:

http://rdc.taobao.com/team/top/tag/hadoop-php-stdin/

http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值