我花了几天时间试图让Spark与我的Jupyter笔记本和Anaconda一起工作.这是我的.bash_profile的样子:
PATH="/my/path/to/anaconda3/bin:$PATH"
export JAVA_HOME="/my/path/to/jdk"
export PYTHON_PATH="/my/path/to/anaconda3/bin/python"
export PYSPARK_PYTHON="/my/path/to/anaconda3/bin/python"
export PATH=$PATH:/my/path/to/spark-2.1.0-bin-hadoop2.7/bin
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark
export SPARK_HOME=/my/path/to/spark-2.1.0-bin-hadoop2.7
alias pyspark="pyspark --conf spark.local.dir=/home/puifais --num-executors 30 --driver-memory 128g --executor-memory 6g --packages com.databricks:spark-csv_2.11:1.5.0"
当我输入/my/path/to/spark-2.1.0-bin-hadoop2.7/bin/spark-shell时,我可以在我的命令行shell中启动Spark.输出sc不是空的.它似乎工作正常.
当我键入pyspark时,它会启动我的Jupyter笔记本电脑.当我创建一个新的Python3