spark 把一列数据合并_Spark Scala数据框:将多列合并为单列

I have a spark dataframe which looks something like below:

+---+------+----+

| id|animal|talk|

+---+------+----+

| 1| bat|done|

| 2| mouse|mone|

| 3| horse| gun|

| 4| horse|some|

+---+------+----+

I want to generate a new column, say merged which would look something like

+---+-----------------------------------------------------------+

| id| merged columns |

+---+-----------------------------------------------------------+

| 1| [{name: animal, value: bat}, {name: talk, value: done}] |

| 2| [{name: animal, value: mouse}, {name: talk, value: mone}] |

| 3| [{name: animal, value: horse}, {name: talk, value: gun}] |

| 4| [{name: animal, value: horse}, {name: talk, value: some}] |

+---+-----------------------------------------------------------+

Basically, combining all the columns into an Array of case class merged(name:String, value: String).

Can anyone help me with how to do this in Scala?

Here for simplicity I have used only two columns but generic answer which can work for N number of columns would greatly help.

解决方案

Your expected output doesn't seem to reflect your requirement of producing a list of name-value structured objects. If I understand it correctly, consider using foldLeft to iteratively convert the wanted columns to StructType name-value columns, and group them into an ArrayType column:

import org.apache.spark.sql.functions._

val df = Seq(

(1, "bat", "done"),

(2, "mouse", "mone"),

(3, "horse", "gun"),

(4, "horse", "some")

).toDF("id", "animal", "talk")

val cols = df.columns.filter(_ != "id")

val resultDF = cols.

foldLeft(df)( (accDF, c) =>

accDF.withColumn(c, struct(lit(c).as("name"), col(c).as("value")))

).

select($"id", array(cols.map(col): _*).as("merged"))

resultDF.show(false)

// +---+-----------------------------+

// |id |merged |

// +---+-----------------------------+

// |1 |[[animal,bat], [talk,done]] |

// |2 |[[animal,mouse], [talk,mone]]|

// |3 |[[animal,horse], [talk,gun]] |

// |4 |[[animal,horse], [talk,some]]|

// +---+-----------------------------+

resultDF.printSchema

// root

// |-- id: integer (nullable = false)

// |-- merged: array (nullable = false)

// | |-- element: struct (containsNull = false)

// | | |-- name: string (nullable = false)

// | | |-- value: string (nullable = true)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值