@羲凡——只为了更好的活着
Spark RDD与DataFrame相互转换
Q:Spark中RDD转成DataFrame用什么算子
A:.rdd
Q:Spark中DataFrame转成RDD用什么算子
A:.toDF
1.直接上代码
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.types.{IntegerType, StringType, StructField, StructType}
import org.apache.spark.sql.{DataFrame, SparkSession}
object Demo{
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder()
.appName("Demo")
.master("local[*]")
.getOrCreate()
val file="file:///D:\\data\\test.txt"
val schema = StructType(Array(
StructField("name",StringType), StructField("age",IntegerType)
))
// DataFrame转换成RDD
println("=========DataFrame转换成RDD=========")
val dataFrame: DataFrame = spark
.read
.option("delimiter",",")
.schema(schema)
.csv(file)
dataFrame.show()
val rdd: RDD[(String, Int)] = dataFrame
.rdd
.map(t => (t.getAs[String](0),t.getAs[Int](1)))
println("===转换后的结果如下===")
rdd.foreach(println)
// RDD转换成DataFrame
println("=========RDD转换成DataFrame=========")
import spark.implicits._
val rdd2: RDD[(String, String)] = spark
.sparkContext
.textFile(file)
.map(t=>{
val arr = t.split(",")
(arr(0),arr(1))
})
rdd2.foreach(println)
val dataFrame2: DataFrame = rdd2
.toDF("name","age")
println("===转换后的结果如下===")
dataFrame2.show()
spark.stop()
}
}
2.结果展示
=========DataFrame转换成RDD=========
+----+---+
|name|age|
+----+---+
| 扎克|227|
| 赵信|200|
| 魔腾|188|
+----+---+
===转换后的结果如下===
(扎克,227)
(赵信,200)
(魔腾,188)
=========RDD转换成DataFrame=========
(魔腾,188)
(扎克,227)
(赵信,200)
===转换后的结果如下===
+----+---+
|name|age|
+----+---+
| 扎克|227|
| 赵信|200|
| 魔腾|188|
+----+---+
====================================================================
@羲凡——只为了更好的活着
若对博客中有任何问题,欢迎留言交流