Spark array操作 How to convert column of arrays of strings to strings?

have a column, which is of type array < string > in spark tables. I am using SQL to query these spark tables. I wanted to convert the array < string > into string.

When used the below syntax:

select cast(rate_plan_code as string) as new_rate_plan from
customer_activity_searches group by rate_plan_code
rate_plan_code column has following values:

[“AAA”,“RACK”,“SMOBIX”,“SMOBPX”]
[“LPCT”,“RACK”]
[“LFTIN”,“RACK”,“SMOBIX”,“SMOBPX”]
[“LTGD”,“RACK”]
[“RACK”,“LEARLI”,“NHDP”,“LADV”,“LADV2”]
following are populated in the new_rate_plan column:

org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@e4273d9f
org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@c1ade2ff
org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@4f378397
org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@d1c81377
org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@552f3317

In Spark 2.1+ to do the concatenation of the values in a single Array column you can use the following:

concat_ws standard function
map operator
a user-defined function (UDF)
concat_ws Standard Function
Use concat_ws function.

concat_ws(sep: String, exprs: Column*): Column Concatenates multiple input string columns together into a single string column, using the given separator.

val solution = words.withColumn(“codes”, concat_ws(" ", $“rate_plan_code”))
scala> solution.show
±-------------±----------+
| words| codes|
±-------------±----------+
|[hello, world]|hello world|
±-------------±----------+
map Operator
Use map operator to have full control of what and how should be transformed.

map[U](func: (T) ⇒ U): Dataset[U] Returns a new Dataset that contains the result of applying func to each element.

scala> codes.show(false)
±–±--------------------------+
|id |rate_plan_code |
±–±--------------------------+
|0 |[AAA, RACK, SMOBIX, SMOBPX]|
±–±--------------------------+

val codesAsSingleString = codes.as[(Long, Array[String])]
.map { case (id, codes) => (id, codes.mkString(", ")) }
.toDF(“id”, “codes”)

scala> codesAsSingleString.show(false)
±–±------------------------+
|id |codes |
±–±------------------------+
|0 |AAA, RACK, SMOBIX, SMOBPX|
±–±------------------------+

scala> codesAsSingleString.printSchema
root
|-- id: long (nullable = false)
|-- codes: string (nullable = true)

In spark 2.1+, you can directly use concat_ws to convert(concat with seperator) string/array< String > into String .

select concat_ws(’,’,rate_plan_code) as new_rate_plan from
customer_activity_searches group by rate_plan_code
This will give you response like:

AAA,RACK,SMOBIX,SMOBPX
LPCT,RACK
LFTIN,RACK,SMOBIX,SMOBPX
LTGD,RACK
RACK,LEARLI,NHDP,LADV,LADV2
PS : concat_ws doesn’t works with like array< Long > …, for which UDF or map would be the only option as told by Jacek.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值