2天前我可以运行pyspark基本操作。
我试图在Jupyter Notebook上运行简单的命令
data = sc.textfile('airline.csv')
==> getting following error.
NameError Traceback (most recent call last)
in
----> 1 data = sc.textfile('airline.csv')
NameError: name 'sc' is not defined
我设置了以下系统变量集
HADOOP_HOME = C:spark-3.0.0-preview-bin-hadoop2.7
PYSPARK_DRIVER_PYTHON = ipython
PYSPARK_DRIVER_PYTHON_OPTS = notebook
SPARK_HOME = C:spark-3.0.0-preview-bin-hadoop2.7
(java and python system variables are already set)
path = C:spark-3.0.0-preview-bin-hadoop2.7in ( i have loaded winutils.exe in this folder)
现在,如果我删除了我的PySpkKyDeRuleVyPython和PySpkKyDrVielyPythONoopts变量并在命令提示符上运行PySpple,那么我会得到以下错误。
C:spark-3.0.0-preview-bin-hadoop2.7>pyspark
Python 3.6.6 (v3.6.6:4cf1f54eb7, Jun 27 2018, 03:37:03) [MSC v.1900 64 bit (AMD64)