如下在实现spark的udf函数时:
val randomNew = (arra:Seq[String], n:Int)=>{
if(arra.size < n){
return arra.toSeq
}
var arr = ArrayBuffer[String]()
arr ++= arra
var outList:List[String]=Nil
var border=arr.length//随机数范围
for(i<-0 to n-1){//生成n个数
val index=(new Random).nextInt(border)
outList=outList:::List(arr(index))
arr(index)=arr.last//将最后一个元素换到刚取走的位置
arr=arr.dropRight(1)//去除最后一个元素
border-=1
}
outList.toSeq
}
sqlContext.udf.register("randomNew", randomNew)
执行出现如下错误:
Caused by: org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2067)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:707)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:706)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:706)
at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.apply(TungstenAggregate.scala:86)
at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.apply(TungstenAggregate.scala:80)
at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
... 28 more
Caused by: java.io.NotSerializableException: java.lang.Object
Serialization stack:
错误就是 return arra.toSeq 这块的问题,如果要使用return,就要使用模式匹配做,不然就会出现上述的错误。
修改后的代码如下:
val randomNew = (arra: Seq[String], n: Int) => {
val routeKey = arra.size <= n
routeKey match {
case true => arra
case _ => {
var arr = ArrayBuffer[String]()
arr ++= arra
var outList: List[String] = Nil
var border = arr.length //随机数范围
for (i <- 0 to n - 1) {
//生成n个数
val index = (new Random).nextInt(border)
outList = outList ::: List(arr(index))
arr(index) = arr.last //将最后一个元素换到刚取走的位置
arr = arr.dropRight(1) //去除最后一个元素
border -= 1
}
outList
}
}
}