在是使用 pyspark 连接spark 时出现一下错误,当时试了很多方都没有解决,最后终于解决。如下所示
from pyspark import SparkContext
from pyspark import SparkConf
import pyspark
string_test = 'pyspark_test'
print(pyspark.__version__)
conf = SparkConf().setAppName(string_test).setMaster('spark://master:7077')
sc = SparkContext(conf=conf)
#
list_test = [1, 2, 3]
x = sc.parallelize(list_test)
y = x.map(lambda x: (x, x * 2))
print (x.collect())
print (y.collect())
sc.stop()
报错
ValueError: Cannot run multiple SparkContexts at once; existing SparkContext(app=PySparkShell, master=local[*]) created by <module> at /usr/local/spark/python/pyspark/shell.py:59
解决办法
出现这个错误是因为之前已将启动了SparkContext ,所以需要先关闭spark,然后再启动