I have two versions of Python. When I launch a spark application using spark-submit, the application uses the default version of Python. But, I want to use the other one.
How to specify the version of Python for spark-submit to use?
解决方案
You can set the PYSPARK_PYTHON variable in conf/spark-env.sh (in Spark's installation directory) to the absolute path of the desired Python executable.
Spark distribution contains spark-env.sh.template (spark-env.cmd.template on Windows) by default. It must be renamed to spark-env.sh (spark-env.cmd) first.
For example, if Python executable is installed under /opt/anaconda3/bin/python3:
PYSPARK_PYTHON='/opt/anaconda3/bin/python3'
Check out the configuration documentation for more information.