java.lang.NullPointerException at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init

今天在使用Hbase MultitableInputFormat多表输入时发生下面错误:


Unable to initialize MapOutputCollector org.apache.hadoop.mapred.MapTask$MapOutputBuffer
java.lang.NullPointerException
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:1008)
	at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:401)
	at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:81)
	at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:695)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
2017-12-29 16:02:09 [INFO]-[org.apache.hadoop.mapred.LocalJobRunner] map task executor complete.
2017-12-29 16:02:09 [WARN]-[org.apache.hadoop.mapred.LocalJobRunner] job_local1469422870_0001
java.lang.Exception: java.io.IOException: Unable to initialize any output collector
	at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.io.IOException: Unable to initialize any output collector
	at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:412)
	at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:81)
	at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:695)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)


源代码如下: Scala版

/**
  * Description: Use MultitableInput to get data from many tables.
  *
  * Author : Adore Chen
  * Created: 2017-12-29
  */
object MultiTableInputUse {

  def main(args: Array[String]): Unit = {

    val scans = new java.util.ArrayList[Scan]()
    var scan = new Scan()
    scan.setAttribute(Scan.SCAN_ATTRIBUTES_TABLE_NAME, Bytes.toBytes("user_info"))
    scans.add(scan)
    scan = new Scan()
    scan.setAttribute(Scan.SCAN_ATTRIBUTES_TABLE_NAME, Bytes.toBytes("user_education"))
    scans.add(scan)

    val job = Job.getInstance()
    TableMapReduceUtil.initTableMapperJob(scans, classOf[UnionMapper], classOf[ImmutableBytesWritable], classOf[Put], job)
    job.setOutputFormatClass(classOf[TableOutputFormat[ImmutableBytesWritable]])
    job.getConfiguration.set(TableOutputFormat.OUTPUT_TABLE, "user_union")

    System.exit(if (job.waitForCompletion(true)) 0 else 1)
  }

}

class UnionMapper extends TableMapper[ImmutableBytesWritable, Put]{

  override protected def map(key: ImmutableBytesWritable, result: Result, context: Mapper[ImmutableBytesWritable, Result, ImmutableBytesWritable, Put]#Context): Unit = {
    // user_info pf --> user_union pf
    // user_education c --> user_union ed

    val unionFamilyPF = Bytes.toBytes("pf")
    val unionFamilyED = Bytes.toBytes("ed")
    val pfFamily = Bytes.toBytes("pf")
    val cFamily = Bytes.toBytes("c")

    val put = new Put(key.copyBytes())
    result.getNoVersionMap.forEach((family, columns) => {
      family match {
        case pfFamily => columns.forEach((column,value) => put.addImmutable(unionFamilyPF, column, value))
        case cFamily => columns.forEach((column,value) => put.addImmutable(unionFamilyED, column, value))
      }
    })
    context.write(key, put)
  }
}

在网上搜了很多类似错误后,都没有很好的解决方案。没办法只有看源码了。查看异常发生在MapTask 1008行:


发现是valSerializer 空指针异常,也就是1007行没有找到MapOutputValueClass对应的Serializer,这里MapOutputValueClass是Hbase的Put类,也就是没有Put类对应的串行化类。


Hadoop Serialization 接口如下:

package org.apache.hadoop.io.serializer;

import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability;

/**
 * <p>
 * Encapsulates a {@link Serializer}/{@link Deserializer} pair.
 * </p>
 * @param <T>
 */
@InterfaceAudience.LimitedPrivate({"HDFS", "MapReduce"})
@InterfaceStability.Evolving
public interface Serialization<T> {
  
  /**
   * Allows clients to test whether this {@link Serialization}
   * supports the given class.
   */
  boolean accept(Class<?> c);
  
  /**
   * @return a {@link Serializer} for the given class.
   */
  Serializer<T> getSerializer(Class<T> c);

  /**
   * @return a {@link Deserializer} for the given class.
   */
  Deserializer<T> getDeserializer(Class<T> c);
}



Hbase包中提供了三个实现类 KeyValueSerialization, MutationSerialization, ResultSerialzation,使用这三个类方便对Hbase相关的对象进行序列化,以便在Map过程的Shuffle阶段进行序列化传输到Reduce节点;


所以我们必须加上MutationSerialzation才能处理Put类,同理是ResultSerialization。


解决方案:

在job启动之前加入:

// solve the Result, Put, KeyValue Serialization
    job.getConfiguration.setStrings("io.serializations", job.getConfiguration.get("io.serializations"), classOf[MutationSerialization].getName, classOf[ResultSerialization].getName)
    


OK,重新运行Job,通过。


后记:换成单表TableInputFormat一点问题没有,进去看源代码

public static void initTableMapperJob(String table, Scan scan,
      Class<? extends TableMapper> mapper,
      Class<?> outputKeyClass,
      Class<?> outputValueClass, Job job,
      boolean addDependencyJars, boolean initCredentials,
      Class<? extends InputFormat> inputFormatClass)
  throws IOException {
    job.setInputFormatClass(inputFormatClass);
    if (outputValueClass != null) job.setMapOutputValueClass(outputValueClass);
    if (outputKeyClass != null) job.setMapOutputKeyClass(outputKeyClass);
    job.setMapperClass(mapper);
    if (Put.class.equals(outputValueClass)) {
      job.setCombinerClass(PutCombiner.class);
    }
    Configuration conf = job.getConfiguration();
    HBaseConfiguration.merge(conf, HBaseConfiguration.create(conf));
    conf.set(TableInputFormat.INPUT_TABLE, table);
    conf.set(TableInputFormat.SCAN, convertScanToString(scan));
    conf.setStrings("io.serializations", conf.get("io.serializations"),
        MutationSerialization.class.getName(), ResultSerialization.class.getName(),
        KeyValueSerialization.class.getName());
    if (addDependencyJars) {
      addDependencyJars(job);
    }
    if (initCredentials) {
      initCredentials(job);
    }
  }

发现

TableMapReduceUtil

类在处理单表输入时,自动加入了Hbase主要类的串行器,也就是

conf.setStrings("io.serializations", conf.get("io.serializations"),
        MutationSerialization.class.getName(), ResultSerialization.class.getName(),
        KeyValueSerialization.class.getName());




但是在多表输入MultitableInputFormat却没有加,真是坑啊,这是Hbase TableMapReduceUtil的一个bug。(我的Hbase版本1.2.0 CDH版,查看了下hbase-server 从1.0.0到1.4.0都有这个问题,其他版本未测试)。


总结: 通过观察很多这个错误的解决方案归结起来就是两个原因

1)hadoop的原生类没有实现Writable,Key需要实现WritableCompare接口,其他扩展系统需要自己实现Serialization接口类,或是在"io.serialization"中没有配置;

2)mapreduce版本1和版本2导入了不兼容的包问题。


希望能对你的问题有帮助。




  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值