1. 安装python,安装好后查看python版本
$ python --version
Python 2.7.6
从下面的pyspark.sh中可以看出,默认是支持2.7的python(spark版本是spark-1.6.0-bin-hadoop2.6)
if hash python2.7 2>/dev/null; then
# Attempt to use Python 2.7, if installed:
DEFAULT_PYTHON="python2.7"
else
DEFAULT_PYTHON="python"
fi
2.运行pyspark
/usr/local/spark$ bin/pyspark
提示IPYTHON,IPYTHON_OPTION已经被替换了,因此设置环境变量:
export PYSPARK_DRIVER_PYTHON=ipython
export PYSPARK_DRIVER_PYTHON_OPTS=notebook
16/01/24 09:34:51 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-bf84dcd6-0789-4ceb-b950-288d6617955c
16/01/24 09:34:51 INFO MemoryStore: MemoryStore started with capacity 517.4 MB
16/01/24 09:34:51 INFO SparkEnv: Registering OutputCommitCoordinator
16/01/24 09:34:52 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/01/24 09:34:52 INFO SparkUI: Started SparkUI at http://192.168.0.101:4040
16/01/24 09:34:52 INFO Executor: Starting executor ID driver on host localhost
16/01/24 09:34:52 INFO Utils: Successfully started service 'org.apac