07 Spark SQL 之 DataSet

Spark SQL 之 DataSet

1. 创建 DataSet

  1. 使用样例类的序列得到 DataSet

    scala> case class Person(name: String, age: Int)
    defined class Person
    
    scala> val ds = Seq(Person("zs",18), Person("ls",20)).toDS
    ds: org.apache.spark.sql.Dataset[Person] = [name: string, age: int]
    
    scala> ds.show
    +----+---+
    |name|age|
    +----+---+
    |  zs| 18|
    |  ls| 20|
    +----+---+
    
  2. 使用基本类型的序列得到 DataSet

    scala> val ds = Seq(1,2,3,4,5).toDS
    ds: org.apache.spark.sql.Dataset[Int] = [value: int]
    
    scala> ds.show
    +-----+
    |value|
    +-----+
    |    1|
    |    2|
    |    3|
    |    4|
    |    5|
    +-----+
    

2. RDD 和 DataSet 的交互

2.1 从 RDD 到 DataSet

// 1.创建样例类
scala> case class Student(name: String, age: Int)
defined class Student

// 2.RDD -> DS
scala> val rdd = sc.textFile("./stu.txt").map(line => {val paras = line.split(","); Student(paras(0),paras(1).toInt)})
rdd: org.apache.spark.rdd.RDD[Student] = MapPartitionsRDD[118] at map at <console>:26

scala> val ds = rdd.toDS
ds: org.apache.spark.sql.Dataset[Student] = [name: string, age: int]

scala> ds.show
+----+---+
|name|age|
+----+---+
| zgl| 23|
| zzx| 20|
| zzz| 18|
+----+---+

2.2 从 DataSet 到 RDD

  1. 调用 rdd 方法即可。

    scala> ds.show
    +----+---+
    |name|age|
    +----+---+
    | zgl| 23|
    | zzx| 20|
    | zzz| 18|
    +----+---+
    
    scala> val rdd = ds.rdd
    rdd: org.apache.spark.rdd.RDD[Student] = MapPartitionsRDD[123] at rdd at <console>:30
    
    scala> rdd.collect
    res41: Array[Student] = Array(Student(zgl,23), Student(zzx,20), Student(zzz,18))
    

3. DataFrame 和 DataSet 之间的交互

3.1 从 DataFrame 到 DataSet

scala> val df = spark.read.json("./stu.json")
df: org.apache.spark.sql.DataFrame = [age: bigint, name: string]

scala> case class Student(name: String, age: BigInt)
defined class Student

scala> val ds = df.as[Student]
ds: org.apache.spark.sql.Dataset[Student] = [age: bigint, name: string]

scala> ds.show
+---+----+
|age|name|
+---+----+
| 22| zgl|
| 18| zzx|
| 20| zzz|
+---+----+

3.2 从 DataSet 到 DataFrame

scala> case class People(name: String, age: Int)
defined class People

scala> val ds = Seq(People("zzz",18)).toDS
ds: org.apache.spark.sql.Dataset[People] = [name: string, age: int]

scala> val df = ds.toDF
df: org.apache.spark.sql.DataFrame = [name: string, age: int]

scala> df.show
+----+---+
|name|age|
+----+---+
| zzz| 18|
+----+---+
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值