我使用ptpython(1) ,它提供ipython功能以及您选择的vi(1)或emacs(1)键绑定。 它还提供了动态代码弹出/智能功能,这在CLI上进行临时SPARK工作或只是尝试学习Spark API时非常有用。
这是启用vi的 ptpython会话的样子,请注意screehshot底部的VI(INSERT)模式,以及ipython样式提示,表明已选择了这些ptpython功能(有关如何选择的更多信息)一会儿):
要获得所有这些,请执行以下简单步骤 :
user@linux$ pip3 install ptpython # Everything here assumes Python3
user@linux$ vi ${SPARK_HOME}/conf/spark-env.sh
# Comment-out/disable the following two lines. This is necessary because
# they take precedence over any UNIX environment settings for them:
# PYSPARK_PYTHON=/path/to/python
# PYSPARK_DRIVER_PYTHON=/path/to/python
user@linux$ vi ${HOME}/.profile # Or whatever your login RC-file is.
# Add these two lines:
export PYSPARK_PYTHON=python3 # Fully-Qualify this if necessary. (python3)
export PYSPARK_DRIVER_PYTHON=ptpython3 # Fully-Qualify this if necessary. (ptpython3)
user@linux$ . ${HOME}/.profile # Source the RC file.
user@linux$ pyspark
# You are now running pyspark(1) within ptpython; a code pop-up/interactive
# shell; with your choice of vi(1) or emacs(1) key-bindings; and
# your choice of ipython functionality or not.
要选择您的pypython偏好设置(有很多),只需在ptpython会话中按F2键,然后选择所需的任何选项即可。
结束语 :如果要提交Python Spark应用程序(与通过CLI与pyspark(1)进行交互(如上所示) 相反 ),只需在Python中以编程方式设置PYSPARK_PYTHON和PYSPARK_DRIVER_PYTHON ,如下所示:
os.environ['PYSPARK_PYTHON'] = 'python3'
os.environ['PYSPARK_DRIVER_PYTHON'] = 'python3' # Not 'ptpython3' in this case.
我希望这个答案和设置有用。