Mapreduce编程模型环境搭建详情:MapReduce编程模型实现WordCount程序,在搭建的YARN上运行
数据类型格式:
本次map方法输出结果不在是单一的数据类型了,而是一个FlowBean的包装类,其中包含数据 upflow、downflow、phone等
将读取到的Text类型的value装换成string类型,按照"\t"切分成数组。读取的每一行为一个数组,根据数据格式,数组第二个为手机号,倒数第二第三分别是downflow、upflow。写入到context中。
FlowCountMapper.java
package mapreduce_flowcount;
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class FlowCountMapper extends Mapper<LongWritable, Text, Text, FlowBean> {
@Override
protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, FlowBean>.Context context)
throws IOException, InterruptedException {
String string = value.toString();
String[] stringArray = string.split("\t");
String phone = stringArray[1];
int upflow = Integer.parseInt(stringArray[stringArray.length-3]);
int downflow = Integer.parseInt(stringArray[stringArray.length-2]);
context.write(new Text(phone), new FlowBean(upflow,downflow,phone));
}
}
FlowBean.java
自定义数据类型实现hadoop序列化接口
用于封装upflow,downflow,phone和acount(通过计算得到)#一定要有无