DataFrame object has no attribute ‘col’
In Spark: The Definitive Guide it says:
If you need to refer to a specific DataFrame’s column, you can use the col method on the specific DataFrame.
问题描述
For example (in Python/Pyspark):
最关键的问题就是这个语句在python语句中是不对的,这是在Scala中使用的
df.col("count")
However, when I run the latter code on a dataframe containing a column count I get the error
‘DataFrame’ object has no attribute ‘col’.
If I try column I get a similar error.
Is the book wrong, or how should I go about doing this?
I’m on Spark 2.3.1. The dataframe was created with the following:
df = spark.read.format("json").load("/Users/me/Documents/Books/Spark-The-Definitive
解决:
The book you’re referring to describes Scala / Java API. In PySpark use []
df["count"]
但是同时也要区分着这种情况:
例如:
df.where(col(''ORIGIN_COUNTRY_NAME') != 'United States').show()
如果这里出现:
name ‘col’ is not defined
这是因为你可能没有导入col这个函数
需要加上:
from spark.sql.functions import col