Exception in thread "main" org.apache.spark.SparkException: Task not serializable--two

最新推荐文章于 2022-11-14 20:50:57 发布

jackliu2013

最新推荐文章于 2022-11-14 20:50:57 发布

阅读量8.6k

点赞数

分类专栏： spark 文章标签： exception Scala spark log

本文链接：https://blog.csdn.net/u010770919/article/details/44239353

版权

spark 专栏收录该内容

10 篇文章 0 订阅

订阅专栏

声明：工程使用的是Scala语言开发的

需求：在工程中加入log日志记录

问题：Exception in thread "main" org.apache.spark.SparkException: Task not serializable，序列化异常

Exception in thread "main" org.apache.spark.SparkException: Task not serializable

解决方案一：

直接使用常规的 log = LoggerFactory.getLogger（），不行的，报错，序列化问题

解决方案二：

通过继承，序列化Logger

class LocalLogger(name: String) extends Logger(name) with Serializable { }

解决方案三：

定义一个trait，混入到各个类中

        /**
 * @author Administrator
 *
 */
trait DefLogger {
  
  
   // Make the log field transient so that objects with Logging can
  // be serialized and used on another machine
  @transient private var log_ : Logger = null


  // Method to get or create the logger for this object
  protected def log: Logger = {
    if (log_ == null) {
      
      var className = this.getClass.getName
      // Ignore trailing $'s in the class names for Scala objects
      if (className.endsWith("$")) {
        className = className.substring(0, className.length - 1)
      }
      log_ = LoggerFactory.getLogger(className)
    }
    
    log_
  }


}

方案四：

其实方案三是在看了spark的logging源代码以后发现的问题， Scala和spark的类编译成.class以后名称有点不太一样，特别是Java的静态方法和Scala的Object 的区别

Scala的object解析后的。class以$ 结尾，所以会有以上字符串判断操作

在使用第四种解决方案后，记录日志基本上是没有问题的，但是，我在一个RDD的transform 和 action 时依然出现序列化的问题；代码如下

在类中定义的log如下：

  @transient private var log_ : Logger = null


  // Method to get or create the logger for this object
  protected def log: Logger = {
    if (log_ == null) {
      
      var className = this.getClass.getName
      // Ignore trailing $'s in the class names for Scala objects
      if (className.endsWith("$")) {
        
        className = className.substring(0, className.length - 1)
        
      }
      log_ = LoggerFactory.getLogger(className)
    }
    log_
  }

在rdd的map操作中遇到异常时记录异常

case e: ArrayIndexOutOfBoundsException => {

log.error("lewa_device_upgrade ArrayIndexOutOfBoundsException: " + line)

Row("NO", "NO", "NO", "NO", "NO", "NO", "NO", "NO", "NO", 0, 0, "NO", "NO", "NO", "NO", "NO", "NO", "NO", "NO", "NO", "NO",
"NO", 0, "NO", 0, "NO", "NO", 0, "NO", "NO", "NO", "NO", "NO", "NO", 0, 0, 0, 0)
}

突然间想起来之前在spark的官网上看文档时一直理解不了的一段话：

细看了一下文档说明，向spark传递fun有两种方法

(1)匿名函数

(2)单例静态方法

我使用的是单例静态方法：

在类的伴生对象中定义log，代码如下

/**
 * constructor the object
 */
object Lewa_register extends Serializable {

  def apply(master: String, savepath: String, srcfilepath: String) = new Lewa_register(master, savepath, srcfilepath)
  
    // Make the log field transient so that objects with Logging can
  // be serialized and used on another machine
  @transient private var log_ : Logger = null

  // Method to get or create the logger for this object
  protected def log: Logger = {
    if (log_ == null) {
      
      var className = this.getClass.getName
      // Ignore trailing $'s in the class names for Scala objects
      if (className.endsWith("$")) {
        
        className = className.substring(0, className.length - 1)
        
      }
      log_ = LoggerFactory.getLogger(className)
    }
    log_
  }

}

然后在对应的位置调用