Hadoop: How to using two mapper to do different thing

In my work, I run a situation that I want to use A mapper reading a file with to fields (questionId, questionTags) and outpute format likes key: questionId  value: questionTags, while B mapper reading a dir which contains many files named by questionId with questionContent as its file content  and output format likes key: questionId/fileName  value: questionContent. Then a reducer do some string operations.

 

The framework above is   

 

A mapper 

                 >   reducer 

B mapper

 

The problem can't be solved by ChainMapper.

I found that the two mapper's output format is the same. So, the other way is to adopt one mapper to read questions dir and tags file.

 

 

 

two problems;

a.

            QuestionTagsWritable e1 = null, e2 = null;
           
            for (QuestionTagsWritable e : values) {
                System.out.println("xx = " + e.toString());
                if (e.isTags) {
                    e1 = e;
                } else {
                    e2 = e;
                }
            }

 solution: e1 = new QuestionTagsWritable(true,tmp.content);   //pass value not address

b.

FileSplit fileSplit = (FileSplit) context.getInputSplit();

 

java . lang . ClassCastException : org . apache . hadoop . mapreduce . lib . input . TaggedInputSplit cannot be cast to org . apache . hadoop . mapreduce . lib . input . FileSplit

 solution:

InputSplit split = context.getInputSplit();
		    Class<? extends InputSplit> splitClass = split.getClass();

		    FileSplit fileSplit = null;
		    if (splitClass.equals(FileSplit.class)) {
		        fileSplit = (FileSplit) split;
		    } else if (splitClass.getName().equals(
		            "org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit")) {
		        // begin reflection hackery...

		        try {
		            Method getInputSplitMethod = splitClass
		                    .getDeclaredMethod("getInputSplit");
		            getInputSplitMethod.setAccessible(true);
		            fileSplit = (FileSplit) getInputSplitMethod.invoke(split);
		        } catch (Exception e) {
		            // wrap and re-throw error
		            throw new IOException(e);
		        }

		        // end reflection hackery
		    }

 see:http://stackoverflow.com/questions/11130145/hadoop-multipleinputs-fails-with-classcastexception

 

 

 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值