dataframe数组做元素,如何将元素追加到Spark Dataframe的数组列?

Suppose I have the following DataFrame:

scala> val df1 = Seq("a", "b").toDF("id").withColumn("nums", array(lit(1)))

df1: org.apache.spark.sql.DataFrame = [id: string, nums: array]

scala> df1.show()

+---+----+

| id|nums|

+---+----+

| a| [1]|

| b| [1]|

+---+----+

And I want to add elements to the array in the nums column, so that I get something like the following:

+---+-------+

| id|nums |

+---+-------+

| a| [1,5] |

| b| [1,5] |

+---+-------+

Is there a way to do this using the .withColumn() method of the DataFrame? E.g.

val df2 = df1.withColumn("nums", append(col("nums"), lit(5)))

I've looked through the API documentation for Spark, but can't find anything that would allow me to do this. I could probably use split and concat_ws to hack something together, but I would prefer a more elegant solution if one is possible. Thanks.

解决方案import org.apache.spark.sql.functions.{lit, array, array_union}

val df1 = Seq("a", "b").toDF("id").withColumn("nums", array(lit(1)))

val df2 = df1.withColumn("nums", array_union($"nums", lit(Array(5))))

df2.show

+---+------+

| id| nums|

+---+------+

| a|[1, 5]|

| b|[1, 5]|

+---+------+

The array_union() was added since spark 2.4.0 release on 11/2/2018, 7 months after you asked the question, :) see https://spark.apache.org/news/index.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值