KeyValueTextInputFormat数据格式:
1,Linux mont bb xx zz dd fff
2,Linux windows linux shell xhell
3,yy vv nn mm
Drive类代码:
Configuration conf = new Configuration();
conf.set(KeyValueTextInputFormat.INPUT_DIR_RECURSIVE,",");//INPUT_DIR_RECURSIVE不是切分,这个还是相当于默认的用了“\t"。
//KeyValueTextInputFormat 格式
job.setInputFormatClass(KeyValueTextInputFormat.class);
//实例化配置文件
Configuration conf = new Configuration();
conf.set(KeyValueTextInputFormat.INPUT_DIR_RECURSIVE,",");
//定义一个job任务
Job job = Job.getInstance(conf);
//配置job的信息
job.setJarByClass(WCDriver.class);
//指定自定义的mapper以及mapper的数据类型到job中
job.setMapperClass(WCMap.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
//指自定义的reduce以及reduce的数据类型<总输出的类型>到job
job.setReducerClass(WCReduce.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
//KeyValueTextInputFormat 格式
job.setInputFormatClass(KeyValueTextInputFormat.class);
//配置输入数据的路径
FileInputFormat.setInputPaths(job, new Path("D:\\input\\test\\plus\\aa3.txt"));
//配置输出的路径
FileOutputFormat.setOutputPath(job, new Path("D:\\input\\test\\plus\\2"));
//提交任务
job.waitForCompletion(true);
Map类:注意有坑
public class WCMap extends Mapper<LongWritable, Text, Text, IntWritable> {
//实现父类的快捷键 alt+Ins<Insert>
//ctrl+O继承父类,重写方法(直接的)
@Override
protected void map(LongWritable LongWritable, Text value, Context context) throws IOException, InterruptedException
来测试下结果:报错了。
原因是因为:数据类型不匹配,输出的是Test类型,输入的不应该是LongWritable类型了,转换为Test类型就可以了。
java.lang.Exception: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:489)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:549)
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.LongWritable
at com.itstar.mr.wc0908.WCMap.map(WCMap.java:21)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[main] INFO org.apache.hadoop.mapreduce.Job - Job job_local376701184_0001 running in uber mode : false
[main] INFO org.apache.hadoop.mapreduce.Job - map 0% reduce 0%
[main] INFO org.apache.hadoop.mapreduce.Job - Job job_local376701184_0001 failed with state FAILED due to: NA
[main] INFO org.apache.hadoop.mapreduce.Job - Counters: 0
现在修改Map类: 都是Text类型
@Override
protected void map(Text LongWritable, Text value, Context context) throws IOException, InterruptedException
结果成功:
Reduce shuffle bytes=27
Reduce input records=3
Reduce output records=1
Spilled Records=6
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=0
Total committed heap usage (bytes)=514850816
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=80
File Output Format Counters
Bytes Written=15
Process finished with exit code 0
查看生成的结果:
下面就来设置,KeyValueTextInputFormat数据格式:
drive主类:
conf.set(KeyValueLineRecordReader.KEY_VALUE_SEPERATOR,",");//这才是真正切分
//实例化配置文件
Configuration conf = new Configuration();
conf.set(KeyValueLineRecordReader.KEY_VALUE_SEPERATOR,",");
查看结果:切分成功