先查看一下各列
df.printSchema()
root
|-- Id: string (nullable = true)
|-- groupId: string (nullable = true)
|-- matchId: string (nullable = true)
|-- assists: string (nullable = true)
|-- boosts: string (nullable = true)
|-- damageDealt: string (nullable = true)
|-- DBNOs: string (nullable = true)
|-- headshotKills: string (nullable = true)
|-- heals: string (nullable = true)
|-- killPlace: string (nullable = true)
|-- killPoints: string (nullable = true)
|-- kills: string (nullable = true)
|-- killStreaks: string (nullable = true)
|-- longestKill: string (nullable = true)
|-- maxPlace: string (nullable = true)
|-- numGroups: string (nullable = true)
|-- revives: string (nullable = true)
|-- rideDistance: string (nullable = true)
|-- roadKills: string (nullable = true)
|-- swimDistance: string (nullable = true)
|-- teamKills: string (nullable = true)
|-- vehicleDestroys: string (nullable = true)
|-- walkDistance: string (nullable = true)
|-- weaponsAcquired: string (nullable = true)
|-- winPoints: string (nullable = true)
|-- winPlacePerc: string (nullable = true)
看到kills的dtype是string
根据官方文档,修改一下:
df.kills.astype("int")
Out[29]: Column<b'CAST(kills AS INT)'>
再看一下列属性,发现没变:
df.select("kills").dtypes
Out[34]: [('kills', 'string')]
一个可行的方法:
df = df.withColumn("kills",df.kills.astype("int"))
df.select("kills").dtypes
Out[36]: [('kills', 'int')]
成功了