PySpark整合Jupyter Notebook
主要是两个变量
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS=notebook
然后直接启动pyspark
$SPARK_HOME\bin\pyspark
窗口中的消息输出会给出端口号
[I 14:59:08.242 NotebookApp] 0 active kernels
[I 14:59:08.242 NotebookApp] The Jupyter Notebook is running at: http://localhost:8888/
[I 14:59:08.243 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[I 15:01:35.974 NotebookApp] Saving file at ...
然后从自己的机器上直接用浏览器打开使用就行了
如果要用到外部的jar包,可以加一下系统变量,比如这个就是一个oracle jdbc的例子
export SPARK_CLASSPATH=$ORACLE_HOME/ojdbc8.jar
如果看到以下警告,则改一下spark.executor.extraClassPath或者spark.driver.extraClassPath
WARN SparkConf: SPARK_CLASSPATH was detected (set to '/home/ojdbc8.jar'). This is deprecated in Spark 1.0+.
Please instead use:
./spark-submit