报错
Null value appeared in non-nullable field
java.lang.NullPointerException: Null value appeared in non-nullable field: top level row object
If the schema is inferred from a Scala tuple/case class, or a Java bean, please try to use scala.Option[_] or other nullable types (e.g. java.lang.Integer instead of int/scala.Int).
dataset schema
root
|-- window: long (nullable = false)
|-- linkId: long (nullable = false)
|-- mapVersion: integer (nullable = false)
|-- passthrough: long (nullable = false)
|-- resident: long (nullable = false)
|-- driverId: string (nullable = true)
|-- inLink: map (nullable = true)
| |-- key: long
| |-- value: integer (valueContainsNull = false)
|-- outLink: map (nullable = true)
| |-- key: long
| |-- value: integer (valueContainsNull = false)
报错原因
有些不可以为null的字段被赋值为null了
解决办法
1、过滤为这些字段为null的数据
2、将字段声明为可以为null的类型
例子
val path: String = ???
val peopleDF = spark.read
.option("inferSchema","true")
.option("header", "true")
.option("delimiter", ",")
.csv(path)
peopleDF.printSchema
输出为:
root
|-- name: string (nullable = true)
|-- age: long (nullable = false)
|-- stat: string (nullable = true)
peopleDF.where($"age".isNull).show
输出为:
+----+----+----+
|name| age|stat|
+----+----+----+
| xyz|null| s|
+----+----+----+
接下来将Dataset[Row]
转换为 Dataset[Person]
val peopleDS = peopleDF.as[Person]
peopleDS.printSchema
运行如下代码
peopleDS.where($"age" > 30).show
结果
+----+---+----+
|name|age|stat|
+----+---+----+
+----+---+----+
sql认为null是有效值
运行如下代码
peopleDS.filter(_.age > 30)
报上面的错误
原因是因为scala中Long类型不能为null
解决办法,用Option类
case class Person(name: String, age: Option[Long], stat: String)
peopleDS.filter(_.age.map(_ > 30).getOrElse(false))
结果
+----+---+----+
|name|age|stat|
+----+---+----+
+----+---+----+