spark dataframe 数据类型转换

1.spark sql数据类型

数字类型

● ByteType:代表一个字节的整数。范围是-128到127
● ShortType:代表两个字节的整数。范围是-32768到32767
● IntegerType:代表4个字节的整数。范围是-2147483648到2147483647
● LongType:代表8个字节的整数。范围是-9223372036854775808到9223372036854775807
● FloatType:代表4字节的单精度浮点数 DoubleType:代表8字节的双精度浮点数
● DecimalType:代表任意精度的10进制数据。通过内部的java.math.BigDecimal支持。BigDecimal由一个任意精度的整型非标度值和一个32位整数组成
● StringType:代表一个字符串值
● BinaryType:代表一个byte序列值
● BooleanType:代表boolean值

日期类型

● TimestampType:代表包含字段年,月,日,时,分,秒的值
● DateType:代表包含字段年,月,日的值

复杂类型

● ArrayType(elementType, containsNull):代表由elementType类型元素组成的序列值。containsNull用来指明ArrayType中的值是否有null值
● MapType(keyType, valueType, valueContainsNull):表示包括一组键 - 值对的值。通过keyType表示key数据的类型,通过valueType表示value数据的类型。valueContainsNull用来指明MapType中的值是否有null值
● StructType(fields):表示一个拥有StructFields (fields)序列结构的值
StructField(name, dataType, nullable):代表StructType中的一个字段,字段的名字通过name指定,dataType指定field的数据类型,nullable表示字段的值是否有null值。

2.spark sql和scala数据类型对比

在这里插入图片描述

3.spark sql数据类型转换示例

● 三种方式

● withColumn (推荐)
● selectExpr
● sql

代码

package com.lzx.sparktuning.dataType

import org.apache.spark.SparkContext
import org.apache.spark.sql.functions.col
import org.apache.spark.sql.types.{BooleanType, DateType, StringType}
import org.apache.spark.sql.{DataFrame, SparkSession}

object ModifyDataType {
  def main(args: Array[String]): Unit = {
    val session: SparkSession = SparkSession.builder().appName(this.getClass.getSimpleName).master("local[*]").getOrCreate()
    val sc: SparkContext = session.sparkContext

    val list: List[(String, Int, String, String, String, Double)] =
      List(
        ("James", 34, "2006-01-01", "true", "M", 3000.60),
        ("Michael", 33, "1980-01-10", "true", "F", 3300.80),
        ("Robert", 37, "1992-06-01", "false", "M", 5000.50)
      )
    import session.implicits._
    val df: DataFrame = sc.parallelize(list).toDF("name", "age", "birthday", "isGraduated", "sex", "salary")

    df.printSchema()
    df.show(false)

    //1.withColumn
    println("withColumn---")
    val df2: DataFrame = df.withColumn("age", col("age").cast(StringType))
      .withColumn("isGraduated", col("isGraduated").cast(BooleanType))
      .withColumn("birthday", col("birthday").cast(DateType))
    df2.printSchema()
    df2.show(false)

    //2.selectExpr
    println("selectExpr----")
    val df3: DataFrame = df.selectExpr(
      "cast(age as string) age",
      "cast(isGraduated as boolean) isGraduated",
      "cast(birthday as date) birthday"
    )
    df3.printSchema()
    df3.show(false)

    //3.sql
    println("sql---")
    df.createOrReplaceTempView("tmp")
    val df4: DataFrame = session.sql(
      s"""
         |SELECT STRING(age),BOOLEAN(isGraduated),DATE(birthday)
         |from tmp
         |""".stripMargin)
    df4.printSchema()
    df4.show(false)

  }
}

输出

root
 |-- name: string (nullable = true)
 |-- age: integer (nullable = false)
 |-- birthday: string (nullable = true)
 |-- isGraduated: string (nullable = true)
 |-- sex: string (nullable = true)
 |-- salary: double (nullable = false)

+-------+---+----------+-----------+---+------+
|name   |age|birthday  |isGraduated|sex|salary|
+-------+---+----------+-----------+---+------+
|James  |34 |2006-01-01|true       |M  |3000.6|
|Michael|33 |1980-01-10|true       |F  |3300.8|
|Robert |37 |1992-06-01|false      |M  |5000.5|
+-------+---+----------+-----------+---+------+

1.withColumn---
root
 |-- name: string (nullable = true)
 |-- age: string (nullable = false)
 |-- birthday: date (nullable = true)
 |-- isGraduated: boolean (nullable = true)
 |-- sex: string (nullable = true)
 |-- salary: double (nullable = false)

+-------+---+----------+-----------+---+------+
|name   |age|birthday  |isGraduated|sex|salary|
+-------+---+----------+-----------+---+------+
|James  |34 |2006-01-01|true       |M  |3000.6|
|Michael|33 |1980-01-10|true       |F  |3300.8|
|Robert |37 |1992-06-01|false      |M  |5000.5|
+-------+---+----------+-----------+---+------+

2.selectExpr----
root
 |-- age: string (nullable = false)
 |-- isGraduated: boolean (nullable = true)
 |-- birthday: date (nullable = true)

+---+-----------+----------+
|age|isGraduated|birthday  |
+---+-----------+----------+
|34 |true       |2006-01-01|
|33 |true       |1980-01-10|
|37 |false      |1992-06-01|
+---+-----------+----------+

3.sql---
root
 |-- age: string (nullable = false)
 |-- isGraduated: boolean (nullable = true)
 |-- birthday: date (nullable = true)

+---+-----------+----------+
|age|isGraduated|birthday  |
+---+-----------+----------+
|34 |true       |2006-01-01|
|33 |true       |1980-01-10|
|37 |false      |1992-06-01|
+---+-----------+----------+

本文写于北京海淀五道口目田青年旅舍,由于疫情原因家被偷(封)了,无家可归😮‍💨

  • 2
    点赞
  • 17
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值