问题 spark udf函数不能序列化

如下在实现spark的udf函数时:

val randomNew = (arra:Seq[String], n:Int)=>{
      if(arra.size < n){
        return arra.toSeq
      }
      var arr = ArrayBuffer[String]()
      arr ++= arra
      var outList:List[String]=Nil
      var border=arr.length//随机数范围
      for(i<-0 to n-1){//生成n个数
      val index=(new Random).nextInt(border)
        outList=outList:::List(arr(index))
        arr(index)=arr.last//将最后一个元素换到刚取走的位置
        arr=arr.dropRight(1)//去除最后一个元素
        border-=1
      }
      outList.toSeq
    }
sqlContext.udf.register("randomNew", randomNew)

执行出现如下错误:

Caused by: org.apache.spark.SparkException: Task not serializable
	at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
	at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
	at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
	at org.apache.spark.SparkContext.clean(SparkContext.scala:2067)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:707)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:706)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
	at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:706)
	at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.apply(TungstenAggregate.scala:86)
	at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.apply(TungstenAggregate.scala:80)
	at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
	... 28 more
Caused by: java.io.NotSerializableException: java.lang.Object
Serialization stack:

错误就是 return arra.toSeq 这块的问题,如果要使用return,就要使用模式匹配做,不然就会出现上述的错误。

修改后的代码如下:

val randomNew = (arra: Seq[String], n: Int) => {
      val routeKey = arra.size <= n
      routeKey match {
        case true => arra
        case _ => {
          var arr = ArrayBuffer[String]()
          arr ++= arra
          var outList: List[String] = Nil
          var border = arr.length //随机数范围
          for (i <- 0 to n - 1) {
            //生成n个数
            val index = (new Random).nextInt(border)
            outList = outList ::: List(arr(index))
            arr(index) = arr.last //将最后一个元素换到刚取走的位置
            arr = arr.dropRight(1) //去除最后一个元素
            border -= 1
          }
          outList
        }
      }
    }
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值