1.生成dataframe的两种方式
由于Spark中构造Dataframe 即使是数值类型的数据 也需要以String形式进行创建,
var data = Seq(
("0.1","0"),
("0.15","0"),
("0.8","1"),
("1.0","1")
).toDF("predict","label")
+-------+-----+
|predict|label|
+-------+-----+
| 0.1| 0|
| 0.15| 0|
| 0.8| 1|
| 1.0| 1|
+-------+-----+
或者:
val data = Array(("1", "2", "3", "4", "5"), ("6", "7", "8", "9", "10"))
val df = spark.createDataFrame(data).toDF("col1", "col2", "col3", "col4", "col5")
+----+----+----+----+----+
|col1|col2|col3|col4|col5|
+----+----+----+----+----+
| 1| 2| 3| 4| 5|
| 6| 7| 8| 9| 10|
+----+----+----+----+----+
2.map转化为dataframe
利用1中Dataframe生成方式,对map进行dataframe化:
/// map 转化为 dataframe
import scala.collection.mutable.ArrayBuffer
val dt = Array(("1", "2", "3", "4", "5"), ("6", "7", "8", "9", "10"))
val data = spark.createDataFrame(dt).toDF("col1", "col2", "col3", "col4", "col5")
val Col2Type = data.dtypes.toMap
val colName_list = data.columns.toList
var arrbuf = ArrayBuffer[(String,String)]()
for (col <- colName_list){
//println("col===="+col.toString)
//println("Col2Type===="+Col2Type.get(col).get.toString)
arrbuf += ((col.toString,Col2Type.get(col).get.toString))
}
println(arrbuf)
val df = spark.createDataFrame(arrbuf).toDF("colname", "type")
df.show()
结果:
Col2Type: scala.collection.immutable.Map[String,String] = Map(col3 -> StringType, col2 -> StringType, col5 -> StringType, col1 -> StringType, col4 -> StringType)
colName_list: List[String] = List(col1, col2, col3, col4, col5)
arrbuf: scala.collection.mutable.ArrayBuffer[(String, String)] = ArrayBuffer()
ArrayBuffer((col1,StringType), (col2,StringType), (col3,StringType), (col4,StringType), (col5,StringType))
df: org.apache.spark.sql.DataFrame = [colname: string, type: string]
+-------+----------+
|colname| type|
+-------+----------+
| col1|StringType|
| col2|StringType|
| col3|StringType|
| col4|StringType|
| col5|StringType|
+-------+----------+