Spark DataFrame：提取某列并修改/ Column更新、替换

最新推荐文章于 2021-05-18 18:02:24 发布

雨笋情缘

最新推荐文章于 2021-05-18 18:02:24 发布

阅读量3.8k

点赞数

本文链接：https://blog.csdn.net/LHS__BRU_745/article/details/101475518

版权

1.concat(exprs: Column*): Column

function note： Concatenates multiple input columns together into a single column. The function works with strings, binary and compatible array columns.

我的问题： dateframe中的某列数据"XX_BM", 例如：值为 0008151223000316, 现在我想把Column("XX_BM")中的所有值变为：例如：0008151223000316sfjd。

0008151223000316 + sfjd

解决方案： in Scala

var tmp = dfval.col("XX_BM")

var result = concat(tmp,lit("sfjd"))

dfval = dfval.withColumn("XX_BM", result)

2.regexp_replace(e: Column, pattern: String, replacement: String): Column

function note: Replace all substrings of the specified string value that match regexp with rep.

我的问题：I got some dataframe with 170 columns. In one column I have a "name" string and this string sometimes can have a special symbols like "'" that are not appropriate, when I am writing them to Postgres. Can I make something like that:【问题来自】

Df[$'name']=Df[$'name'].map(x => x.replaceAll("'","")) ?

但是：I don't want to parse full DataFrame,because it's very huge.Help me please

解决方案：You can't mutate DataFrames, you can only transform them into new DataFrames with updated values. In this case - you can use the regex_replace function to perform the mapping on name column:

import org.apache.spark.sql.functions._

val updatedDf = Df.withColumn("name", regexp_replace(col("name"), ",", ""))

3.regexp_replace(e: Column, pattern: Column, replacement: Column): Column

function note : Replace all substrings of the specified string value that match regexp with rep

详细function 参考：org.apache.spark.sql.functions

雨笋情缘

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Spark DataFrame：提取某列并修改/ Column更新、替换

1.concat(exprs:Column*):Column function note： Concatenates multiple input columns together into a single column. The function works...
复制链接

扫一扫