KeyValueTextInputFormat类型切分格式

KeyValueTextInputFormat数据格式:

1,Linux mont bb xx zz dd fff 
2,Linux windows linux shell xhell
3,yy vv nn mm 

Drive类代码:

        Configuration conf = new Configuration();
        conf.set(KeyValueTextInputFormat.INPUT_DIR_RECURSIVE,",");//INPUT_DIR_RECURSIVE不是切分,这个还是相当于默认的用了“\t"。

        //KeyValueTextInputFormat 格式
        job.setInputFormatClass(KeyValueTextInputFormat.class);

 

        //实例化配置文件
        Configuration conf = new Configuration();
        conf.set(KeyValueTextInputFormat.INPUT_DIR_RECURSIVE,",");
        //定义一个job任务
        Job job = Job.getInstance(conf);

        //配置job的信息
        job.setJarByClass(WCDriver.class);

        //指定自定义的mapper以及mapper的数据类型到job中
        job.setMapperClass(WCMap.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        
        //指自定义的reduce以及reduce的数据类型<总输出的类型>到job
        job.setReducerClass(WCReduce.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        //KeyValueTextInputFormat 格式
        job.setInputFormatClass(KeyValueTextInputFormat.class);
        //配置输入数据的路径
        FileInputFormat.setInputPaths(job, new Path("D:\\input\\test\\plus\\aa3.txt"));
        //配置输出的路径
        FileOutputFormat.setOutputPath(job, new Path("D:\\input\\test\\plus\\2"));
         //提交任务
        job.waitForCompletion(true);

 

Map类:注意有坑

public class WCMap extends Mapper<LongWritable, Text, Text, IntWritable> {
    //实现父类的快捷键 alt+Ins<Insert>
    //ctrl+O继承父类,重写方法(直接的)


    @Override
    protected void map(LongWritable LongWritable, Text value, Context context) throws IOException, InterruptedException 

 来测试下结果:报错了。

原因是因为:数据类型不匹配,输出的是Test类型,输入的不应该是LongWritable类型了,转换为Test类型就可以了。

java.lang.Exception: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.LongWritable
	at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:489)
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:549)
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.LongWritable
	at com.itstar.mr.wc0908.WCMap.map(WCMap.java:21)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
[main] INFO org.apache.hadoop.mapreduce.Job - Job job_local376701184_0001 running in uber mode : false
[main] INFO org.apache.hadoop.mapreduce.Job -  map 0% reduce 0%
[main] INFO org.apache.hadoop.mapreduce.Job - Job job_local376701184_0001 failed with state FAILED due to: NA
[main] INFO org.apache.hadoop.mapreduce.Job - Counters: 0

现在修改Map类:  都是Text类型

 @Override
    protected void map(Text LongWritable, Text value, Context context) throws IOException, InterruptedException 

结果成功:

		Reduce shuffle bytes=27
		Reduce input records=3
		Reduce output records=1
		Spilled Records=6
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=0
		Total committed heap usage (bytes)=514850816
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=80
	File Output Format Counters 
		Bytes Written=15

Process finished with exit code 0

查看生成的结果:

 

 

下面就来设置,KeyValueTextInputFormat数据格式:

 

drive主类:

conf.set(KeyValueLineRecordReader.KEY_VALUE_SEPERATOR,",");//这才是真正切分

        //实例化配置文件
        Configuration conf = new Configuration();
        conf.set(KeyValueLineRecordReader.KEY_VALUE_SEPERATOR,",");

 

查看结果:切分成功

 

 

 

 

 

 

 

 

 

 

 

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

SuperBigData~

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值