df满足条件的值修改,PySpark：当另一个列值满足条件时修改列值

最新推荐文章于 2023-01-06 19:13:42 发布

宝图2borne

最新推荐文章于 2023-01-06 19:13:42 发布

阅读量696

点赞数

文章标签： df满足条件的值修改

I have a PySpark Dataframe that has two columns Id and rank,

+---+----+

| Id|Rank|

+---+----+

| a| 5|

| b| 7|

| c| 8|

| d| 1|

+---+----+

For each row, I'm looking to replace Id with "other" if Rank is larger than 5.

If I use pseudocode to explain:

For row in df:

if row.Rank>5:

then replace(row.Id,"other")

The result should look like,

+-----+----+

| Id|Rank|

+-----+----+

| a| 5|

|other| 7|

|other| 8|

| d| 1|

+-----+----+

Any clue how to achieve this? Thanks!!!

To create this Dataframe:

df = spark.createDataFrame([('a',5),('b',7),('c',8),('d',1)], ["Id","Rank"])

解决方案

You can use when and otherwise like -

from pyspark.sql.functions import *

df\

.withColumn('Id_New',when(df.Rank <= 5,df.Id).otherwise('other'))\

.drop(df.Id)\

.select(col('Id_New').alias('Id'),col('Rank'))\

.show()

this gives output as -

+-----+----+

| Id|Rank|

+-----+----+

| a| 5|

|other| 7|

|other| 8|

| d| 1|

+-----+----+

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

宝图2borne

关注关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
df满足条件的值修改,PySpark：当另一个列值满足条件时修改列值

I have a PySpark Dataframe that has two columns Id and rank,+---+----+| Id|Rank|+---+----+| a| 5|| b| 7|| c| 8|| d| 1|+---+----+For each row, I'm looking to replace Id with "other" if Rank...
复制链接

扫一扫