Mapper for the maximum temperature example

Format of a National Climate Data Center record:

( The line has been split into multiple lines to show each field; in the real file, 
fields are packed into one line with no delimiters.)

0057
332130   # USAF weather station identifier
99999    # WBAN weather station identifier
19500101 # observation date
0300     # observation time
4
+51317   # latitude (degrees x 1000)
+028783  # longitude (degrees x 1000)
FM-12
+0171    # elevation (meters)
99999
V020
320      # wind direction (degrees)
1        # quality code
N
0072
1
00450    # sky ceiling height (meters)
1        # quality code
C
N
010000   # visibility distance (meters)
1        # quality code
N
9
-0128    # air temperature (degrees Celsius x 10)
1        # quality code
-0139    # dew point temperature (degrees Celsius x 10)
1        # quality code
10268    # atmospheric pressure (hectopascals x 10)
1        # quality code


Mapper:
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class MaxTemperatureMapper
  extends Mapper<LongWritable, Text, Text, IntWritable> {
  private static final int MISSING = 9999;
  
  @Override
  public void map(LongWritable key, Text value, Context context)
      throws IOException, InterruptedException {
    
    String line = value.toString();
    String year = line.substring(15, 19);
    int airTemperature;
    if (line.charAt(87) == '+') { // parseInt doesn't like leading plus signs
      airTemperature = Integer.parseInt(line.substring(88, 92));
    } else {
      airTemperature = Integer.parseInt(line.substring(87, 92));
    }
    String quality = line.substring(92, 93);
    if (airTemperature != MISSING && quality.matches("[01459]")) {
      context.write(new Text(year), new IntWritable(airTemperature));
    }
  }
}

Mapper接口的四个形参:

the input key is a long integer offset,

the input value is a line of text

the output key is a year,

the output value is an air temperature (an integer).

该方法中用到的Hadoop基本类型:

LongWritable( 类似于java的long类型)

Text (like Java String)

IntWritable (like Java Integer)

The map() method is passed a key and a value. We convert the Text value containing
the line of input into a Java String, then use its substring() method to extract the
columns we are interested in.

map()方法的输入是一个键和一个值。我们将包含有一行输入的Text值转换为Java的String类型,

然后使用substring()方法提取我们感兴趣的列。

The map() method also provides an instance of Context to write the output to. In this
case, we write the year as a Text object (since we are just using it as a key), and the
temperature is wrapped in an IntWritable. We write an output record only if the tem-
perature is present and the quality code indicates the temperature reading is OK.

map()方法还提供了Context实例用于输出内容的写入。在这种情况下,我们将年份数据按Text对象进行

读写(因为我们把年份当做键),将气温值封装在IntWritable类型中。我们只在气温数据不缺失并且所

对应质量代码显示为正确的气温读数时,才将其写入输出记录中。


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值