用hadoop计算气象数据温度的最大值

The Files

You need 3 files to run the maxTemperature example:

  • a C++ file containing the map and reduce functions,
  • a data file containing some temperature data such as found at the National Climatic Data Cener (NCDC), and
  • Makefile to compile the C++ file.

Max_temperature.cpp

#include <algorithm>
#include <limits>
#include <string>
 
#include  "stdint.h"  // <-- this is missing from the book
 
#include "hadoop/Pipes.hh"
#include "hadoop/TemplateFactory.hh"
#include "hadoop/StringUtils.hh"
 
using namespace std;
 
class MaxTemperatureMapper : public HadoopPipes::Mapper {
public:
  MaxTemperatureMapper(HadoopPipes::TaskContext& context) {
  }
  void map(HadoopPipes::MapContext& context) {
    string line = context.getInputValue();
    string year = line.substr(15, 4);
    string airTemperature = line.substr(87, 5);
    string q = line.substr(92, 1);
    if (airTemperature != "+9999" &&
        (q == "0" || q == "1" || q == "4" || q == "5" || q == "9")) {
      context.emit(year, airTemperature);
    }
  }
};
 
class MapTemperatureReducer : public HadoopPipes::Reducer {
public:
  MapTemperatureReducer(HadoopPipes::TaskContext& context) {
  }
  void reduce(HadoopPipes::ReduceContext& context) {
    int maxValue = -10000;
    while (context.nextValue()) {
      maxValue = max(maxValue, HadoopUtils::toInt(context.getInputValue()));
    }
    context.emit(context.getInputKey(), HadoopUtils::toString(maxValue));
  }
};
 
int main(int argc, char *argv[]) {
  return HadoopPipes::runTask(HadoopPipes::TemplateFactory<MaxTemperatureMapper, 
                              MapTemperatureReducer>());
}


Makefile

Create a Make file with the following entries. Note that you need to figure out if your computer hosts a 32-bit processor or a 64-bit processor, and pick the right library. To find this out, run the following command:

  uname -a

To which the OS responds:

  Linux hadoop6 2.6.31-20-generic #58-Ubuntu SMP Fri Mar 12 05:23:09 UTC 2010 i686 GNU/Linux

The i686 indicates a 32-bit machine, for which you need to use the Linux-i386-32 library. Anything with 64 indicates the other type, for which you use the Linux-amd64-64 library.

CC = g++
HADOOP_INSTALL = /home/hadoop/hadoop
PLATFORM = Linux-i386-32
CPPFLAGS = -m32 -I$(HADOOP_INSTALL)/c++/$(PLATFORM)/include


max_temperature: max_temperature.cpp 
	$(CC) $(CPPFLAGS) $< -Wall -L$(HADOOP_INSTALL)/c++/$(PLATFORM)/lib -lhadooppipes \
	-lhadooputils -lpthread -g -O2 -o $@

Data File

  • Create a file called sample.txt which will contain sample temperature data from the NCDC.
0067011990999991950051507004+68750+023550FM-12+038299999V0203301N00671220001CN9999999N9+00001+99999999999
0043011990999991950051512004+68750+023550FM-12+038299999V0203201N00671220001CN9999999N9+00221+99999999999
0043011990999991950051518004+68750+023550FM-12+038299999V0203201N00261220001CN9999999N9-00111+99999999999
0043012650999991949032412004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+01111+99999999999
0043012650999991949032418004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+00781+99999999999
  • Put the data file in HDFS:
 hadoop dfs -mkdir ncdc  
 hadoop dfs -put sample.txt ncdc

Compiling and Running

  • You need a C++ compiler. GNU g++ is probably the best choice. Check that it is installed (by typing g++ at the prompt). If it is not installed yet, install it!
  sudo apt-get install g++
  • Compile the code:
  make  max_temperature
and fix any errors you're getting.
  • Copy the executable file (max_temperature) to a bin directory in HDFS:
  hadoop dfs -mkdir bin
  hadoop dfs -put max_temperature bin/max_temperature

  • Run the program!
  hadoop pipes -D hadoop.pipes.java.recordreader=true  \ 
                   -D hadoop.pipes.java.recordwriter=true \
                   -input ncdc/sample.txt  -output ncdc-out  \
                   -program bin/max_temperature
  • Verify that you have gotten the right output:
  hadoop dfs -text ncdc-out/part-00000

  1949	111
  1950	22

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值