KeyValueTextInputFormat
》 KeyValueTextInputFormat是什么?
处理每一行均为一条记录, 被分隔符(缺省是tab(\t))分割为key(Text),value(Text)
可以通过 mapreduce.input.keyvaluelinerecordreader.key.value,separator属性(或者旧版本 API 中的 key.value.separator.in.input.line)来设定分隔符。 它的默认值是一个制表符。
以下是一个示例,输入是一个包含4条记录的分片。其中——>表示一个(水平方向的)制表符。
line1 ——>Rich learning form
line2 ——>Intelligent learning engine
line3 ——>Learning more convenient
line4 ——>From the real demand for more close to the enterprise
每条记录表示为以下键/值对:
(line1,Rich learning form)
(line2,Intelligent learning engine)
(line3,Learning more convenient)
(line4,From the real demand for more close to the enterprise)
此时的键是每行排在制表符之前的 Text 序列。
案例分析
1.需求
统计输入文件中每一行的第一个单词相同的行数。
(1)输入数据
banzhang ni hao
xihuan hadoop banzhang
banzhang ni hao
xihuan hadoop banzhang
(2)期望结果数据
banzhang 2
xihuan 2
(3)Map阶段
banzhang ni hao
设置Key和value
<banzhang,1>
写出
(4)Reduce阶段
<banzhang,1>
<banzhang,1>
汇总
<banzhang,2>
写出
(5)Driver
//设置切割符
conf.set(KeyValueLineRecordReader.KEY_VALUE_SEPERATOR," ")
//设置格式输入
job.setInputFormatClass(KeyValueTextInputFormat.class)
代码实现
(1)编写Mapper类
package com.dev1.keyvalue;
import java.io.IOException;
public class KVTextMapper extends Mapper<Text,Text, Text, LongWritable> {
private LongWritable v = new LongWritable();
@Override
protected void map(Text key, Text value, Context context) throws IOException, InterruptedException {
System.out.print(key.toString()+"->");
System.out.println(value.toString());
//hello world java
//<hello,1> //LongWritable v = new LongWritable(); v.set(1);
context.write(key,v);
}
}
(2)编写Reducer类
package com.dev1.keyvalue;
import java.io.IOException;
public class KVTextReducer extends Reducer<Text, LongWritable, Text,LongWritable> {
private LongWritable v = new LongWritable();
@Override
protected void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException {
//<hello,1> <hello,1>
long sum = 0;
for (LongWritable value:values){
sum += value.get();
}
//LongWritable v = new LongWritable();
v.set(sum);
context.write(key,v);
}
}
(3)编写Driver类
package com.dev1.keyvalue;
public class KVTextDriver {
public static void main(String[] args) throws Exception{
Configuration conf = new Configuration();
// 设置切割符
conf.set(KeyValueLineRecordReader.KEY_VALUE_SEPERATOR, " ");
// 1 获取job对象
Job job = Job.getInstance(conf);
// 2 设置jar包位置,关联mapper和reducer
job.setJarByClass(KVTextDriver.class);
job.setMapperClass(KVTextMapper.class);
job.setReducerClass(KVTextReducer.class);
// 3 设置map输出kv类型
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(LongWritable.class);
// 4 设置最终输出kv类型
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
// 5 设置输入输出数据路径
FileInputFormat.setInputPaths(job, new Path(args[0]));
// 设置输入格式
job.setInputFormatClass(KeyValueTextInputFormat.class);
// 6 设置输出数据路径
FileOutputFormat.setOutputPath(job, new Path(args[1]));
// 7 提交job
job.waitForCompletion(true);
}
}