pyspark 数据类型转换_如何在pyspark数据框中将字符串类型的列转换为int形式？

最新推荐文章于 2023-07-28 20:38:28 发布

徐立达

最新推荐文章于 2023-07-28 20:38:28 发布

阅读量2.1k

点赞数 1

文章标签： pyspark 数据类型转换

本文链接：https://blog.csdn.net/weixin_42498347/article/details/112944579

版权

I have dataframe in pyspark. Some of its numerical columns contain 'nan' so when I am reading the data and checking for the schema of dataframe, those columns will have 'string' type. How I can change them to int type.I replaced the 'nan' values with 0 and again checked the schema, but then also it's showing the string type for those columns.I am following the below code:

data_df = sqlContext.read.format("csv").load('data.csv',header=True, inferSchema="true")

data_df.printSchema()

data_df = data_df.fillna(0)

data_df.printSchema()

my data looks like this:

here columns 'Plays' and 'drafts' containing integer values but because of nan present in these columns,they are treated as string type.

解决方案from pyspark.sql.types import IntegerType

data_df = data_df.withColumn("Plays", data_df["Plays"].cast(IntegerType()))

data_df = data_df.withColumn("drafts", data_df["drafts"].cast(IntegerType()))

You can run loop for each column but this is the simplest way to convert string column into integer.

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

徐立达

关注关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
pyspark 数据类型转换_如何在pyspark数据框中将字符串类型的列转换为int形式？

I have dataframe in pyspark. Some of its numerical columns contain 'nan' so when I am reading the data and checking for the schema of dataframe, those columns will have 'string' type. How I can change...
复制链接

扫一扫