Spark : rdd转化为dataframe ,通过三种方式添加字段名称

目的: 将rdd转化为 dataframe ,并指定字段名称

下面以一个实际的例子来演示:
读取的数据如下: student_info.txt 、依次为

学号姓名性别班级入学日期
170401011001,施礼义,男,0101,20170901
170401011002,王旭,男,0101,20170901
170401011003,肖桢,女,0101,20170901
170401011004,吴佩东,男,0101,20170901
170401011005,魏会,男,0101,20170901
170401011006,曾美,女,0101,20170901
170401011007,邵亚,女,0101,20170901
170401011008,朱燕菊,女,0101,20170901
170401011009,杨明书,男,0101,20170901
170401011010,张伟,男,0101,20170901

代码

package com.zbw.spark

import org.apache.spark.sql.types.{StringType, StructField, StructType}
import org.apache.spark.sql.{Row, SparkSession}
import org.apache.spark.{SparkConf, SparkContext}

object DataFrameTest {

    //样例类、方式三用到
  case class Student(userId: String, userName: String, sex: String, classID: String, date: String)
    
  def main(args: Array[String]): Unit = {
    val spark = SparkSession.builder()
      .config(new SparkConf()).
      appName("t1")
      .master("local[*]")
      .getOrCreate()

    val sc = spark.sparkContext

    import spark.implicits._

    //170401011001,施礼义,男,0101,20170901

    // 方式一: 指定列名添加Schema
    val rdd1 = sc.textFile("input/student_info.txt").
      map { x =>
        val fields = x.split(",")
        (fields(0), fields(1), fields(2), fields(3), fields(4))
      }

    rdd1.toDF("userID", "userName", "sex", "classID", "date").show()

    // 方式二: 通过StructType指定Schema
    val rdd2 = sc.textFile("input/student_info.txt")
    val rowRDD = rdd2.map(_.split(",")).map(x => Row(x(0), x(1), x(2), x(3), x(4)))

    val structType1: StructType = StructType(Seq(
      StructField("userID", StringType),
      StructField("userName", StringType),
      StructField("sex", StringType),
      StructField("classID", StringType),
      StructField("date", StringType)
    ))
    spark.createDataFrame(rowRDD, structType1).show()


    // 方式三 :编写样例类,利用反射机制推断Schema
    sc.textFile("input/student_info.txt")
      .map { x =>
        val fields = x.split(",")
        Student(fields(0), fields(1), fields(2), fields(3), fields(4))
      }.toDF().show()
  }
}

三种方式均输出:

+------------+--------+---+-------+--------+
|      userID|userName|sex|classID|    date|
+------------+--------+---+-------+--------+
|170401011001|     施礼义|  男|   0101|20170901|
|170401011002|      王旭|  男|   0101|20170901|
|170401011003|      肖桢|  女|   0101|20170901|
|170401011004|     吴佩东|  男|   0101|20170901|
|170401011005|      魏会|  男|   0101|20170901|
|170401011006|      曾美|  女|   0101|20170901|
|170401011007|      邵亚|  女|   0101|20170901|
|170401011008|     朱燕菊|  女|   0101|20170901|
|170401011009|     杨明书|  男|   0101|20170901|
|170401011010|      张伟|  男|   0101|20170901|
|170401011011|     张宁波|  男|   0102|20170901|
|170401011012|      石勇|  男|   0102|20170901|
|170401011013|      刘彬|  男|   0102|20170901|
|170401011014|     徐德海|  男|   0102|20170901|
|170401011015|      周涛|  女|   0102|20170901|
|170401011016|     周鹏琼|  女|   0102|20170901|
|170401011017|      刘硕|  男|   0102|20170901|
|170401011018|      黄城|  男|   0102|20170901|
|170401011019|      颜旺|  男|   0102|20170901|
|170401011020|      龙位|  男|   0102|20170901|
+------------+--------+---+-------+--------+
©️2020 CSDN 皮肤主题: 技术黑板 设计师:CSDN官方博客 返回首页