tdf4.select("id").rdd.map(x=>Array(x(0),cmls)).toDF("id","cmls").show()
试图构造上述dataframe,其中cmls是个array[any]
此时报错:
value toDF is not a member of org.apache.spark.rdd.RDD[Array[Any]]
尝试了网上的 import sqlContext.implicits._
. 不行后,发现是由于这个方法只针对下面几种格式:
RDD[Int]
RDD[Long]
RDD[String]
RDD[T <: scala.Product]
后来改成
tdf4.select("id").rdd.map(x=>Array(x(0),cmls)).map({
case Array(val1: String, val2: Array) => (val1, val2)
}).toDF("id","cmls")
报错Array takes type parameters
这是因为Array后面要指定类型,故改为
tdf4.select("id").rdd.map(x=>Array(x(0),cmls)).map({
case Array(val1: String, val2: Array[String]) => (val1, val2)
}).toDF("id","cmls")