While we follow spark example
case class model_instance (features: Vector)
//and
val df = rawData.map(line =>
| { model_instance( Vectors.dense(line.split(",").filter(p => p.matches("\\d*(\\.?)\\d*"))
| .map(_.toDouble)) )}).toDF()
Errors may occur like below:
Column features must be of type org.apache.spark.ml.linalg.VectorUDT
Error with RDD[Vector] in function parameter
type Vector takes type parameters
for the error we should realize that the ml and mllib are in different version of Spark Machine Learning, you may not mix them together, just use this:
import org.apache.spark.ml.linalg.{Vector, Vectors}
//or
import org.apache.spark.mllib.linalg.{Vector, Vectors}
//remember not mix above import together, just use one case