scala – 使用withColumn将两列添加到现有DataFrame
现在我想再向现有的DataFrame添加两列.
目前我正在使用DataFrame中的withColumn方法执行此操作.
withColumn()方法:
withColumn(colName, col)[source]
Returns a new DataFrame by adding a column or replacing the existing column that has the same name.
The column expression must be an expression over this DataFrame; attempting to add a column from some other dataframe will raise an error.
Parameters
colName – string, name of the new column.
col – a Column expression for the new column.
>>> df.withColumn('age2', df.age + 2).collect()
[Row(age=2, name='Alice', age2=4), Row(age=5, name='Bob', age2=7)]
例如:
df.withColumn("newColumn1", udf(col("somecolumn")))
.withColumn("newColumn2", udf(col("somecolumn")))
这种方法需要两次调用AFAIk(每个新列一次).但是如果你的udf计算量很大,你可以避免在将“复杂”结果存储到临时列中然后“解压缩”结果时将其调用两次
使用案例类或元组作为udf的结果
编辑:
使用UDF返回元组,解压缩将如下所示:
val newDf = df
.withColumn("udfResult",myUDf(col("name")))
.withColumn("lowercaseColumn", col("udfResult._1"))
.withColumn("uppercaseColumn", col("udfResult._2"))
.drop("udfResult")
文章参考:https://codeday.me/bug/20180824/228440.html