dataframe删除全部为空列_如何在PySpark DataFrame中删除具有空值的所有列?

I have a large dataset of which I would like to drop columns that contain null values and return a new dataframe. How can I do that?

The following only drops a single column or rows containing null.

df.where(col("dt_mvmt").isNull()) #doesnt work because I do not have all the columns names or for 1000's of columns

df.filter(df.dt_mvmt.isNotNull()) #same reason as above

df.na.drop() #drops rows that contain null, instead of columns that contain null

For example

a | b | c

1 | | 0

2 | 2 | 3

In the above case it will drop the whole column B because one of its values is empty.

解决方案

Here is one possible approach for dropping all columns that have NULL values: See here for the source on the code of counting NULL values per column.

import pyspark.sql.functions as F

# Sample data

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值