Spark：给DataFrame添加一个有类型的null列

最新推荐文章于 2022-08-21 23:16:41 发布

weixin_34130269

最新推荐文章于 2022-08-21 23:16:41 发布

阅读量1.8k

点赞数

文章标签：大数据 scala python

原文链接：http://www.cnblogs.com/xuejianbest/p/10285002.html

版权

我们知道，scala中Int类型不能为null，
而Dataset中表示的Int值的IntegerType类型列却能为null。

如果我们想产生一个IntegerType类型列为null的DataFrame该怎么做？
下面的代码可以做到：

import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
val df_json = spark.createDataFrame(List(
  (1.2, 1),
  (3.1, 2)))
  .toDF("col1", "col2")
val udf_null = udf((s: Any) => null)
val df_res = df_json.withColumn("col3", udf_null(col("col1")).cast(IntegerType))
df_res.show

scala> df_res.printSchema
root
 |-- col1: double (nullable = false)
 |-- col2: integer (nullable = false)
 |-- col3: integer (nullable = true)

scala> df_res.show
+----+----+----+
|col1|col2|col3|
+----+----+----+
| 1.2|   1|null|
| 3.1|   2|null|
+----+----+----+