6. Spark SQL 和 DataFrames 支持下面的数据类型 :

最新推荐文章于 2024-02-23 17:27:09 发布

元元的李树

最新推荐文章于 2024-02-23 17:27:09 发布

阅读量1.2k

点赞数 1

本文链接：https://blog.csdn.net/qq0719/article/details/100742736

版权

==========================================================

Spark SQL 和 DataFrames 支持下面的数据类型 :

数值类型
- ByteType : 表示 1 字节长的有符号整型，数值范围 : -128 到 127。
- ShortType : 表示 2 字节长的有符号整型，数值范围 : -32768 到 32767。
- IntegerType : 表示 4 字节长的有符号整型，数值范围 : -2147483648 到 2147483647。
- LongType : 表示 8 字节长的有符号整型，数值范围 : -9223372036854775808 到 9223372036854775807。
- FloatType : 表示 4 字节长的单精度浮点数。
- DoubleType : 表示 8 字节长的双精度浮点数。
- DecimalType : 表示任意精度有符号带小数的数值。内部使用 java.math.BigDecimal，一个BigDecimal 由一个任意精度的整数非标度值和一个 32 位的整数标度 (scale) 组成。
字符串类型
- StringType : 表示字符串值
二进制类型
- BinaryType : 表示字节序列值
布尔类型
- BooleanType : 表示布尔值
日期类型
- TimestampType : 表示包含年月日、时分秒等字段的日期值
- DateType : 表示包含年月日字段的日期值
Complex types（复杂类型）
- ArrayType(elementType, containsNull) : 数组类型，表示一个由类型为 elementType 的元素组成的序列，containsNull 用来表示 ArrayType 中的元素是否能为 null 值。
- MapType(keyType, valueType, valueContainsNull) : 映射类型，表示一个键值对的集合。键的类型由 keyType 表示，值的类型则由 valueType 表示。对于一个 MapType 值，键是不允许为 null值。valueContainsNull 用来表示一个 MapType 的值是否能为 null 值。
- StructType(fields) : 表示由 StructField 序列描述的结构。
  - StructField(name, datatype, nullable) : 表示 StructType 中的一个字段，name 表示字段名，datatype 是字段的数据类型，nullable 用来表示该字段是否可以为空值。

==========================================================

对于嵌套结构数据，如何定义schema

首先导入包 import org.apache.spark.sql.types._

其次，对于StructType的定义，参考spark源代码，有说https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala

// Extract multiple StructFields. Field names are provided in a set. 提取多个StructField。字段名称以集合形式提供

https://www.programcreek.com/scala/org.apache.spark.sql.types.ArrayType

还是参考之前一个嵌套结构json数据，定义一个Schema

{
    "staffList":{
        "total":3,
        "result":[
            {
                "toco":41,
                "id":1,
                "name":"张三",
                "typeJoin":[
                    "22"
                ],
                "type":2
            },
            {
                "toco":46,
                "id":2,
                "name":"李四",
                "typeJoin":[
                    "22"
                ],
                "type":2
            },
            {
                "toco":42,
                "id":3,
                "name":"王五",
                "typeJoin":[
                    "22"
                ],
                "type":2
            }
        ]
    }
}

下面给出我写定义的schema

val jsSchema = 
StructType(Seq(
StructField("staffList", 
StructType(Seq(
StructField("total", IntegerType),
StructField("result", ArrayType( 
StructType(Seq(
StructField("toco",IntegerType),
StructField("id",StringType),
StructField("name",StringType),
StructField("typeJoin",ArrayType(StringType)),
StructField("type",IntegerType))))))))));

# 或者

val jsSchema = 
StructType(List(
StructField("staffList", 
StructType(List(
StructField("total", IntegerType),
StructField("result", ArrayType( 
StructType(List(
StructField("toco",IntegerType),
StructField("id",StringType),
StructField("name",StringType),
StructField("typeJoin",ArrayType(StringType)),
StructField("type",IntegerType))))))))));

spark上验证定义的schema正确