运行pyspark的时候出现了以下错误:
Py4JJavaError: An error occurred while calling o2198.csv.
: java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext.
This stopped SparkContext was created at:
pyspark java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext.
The currently active SparkContext was created at:
(No active SparkContext.)
网上的解决方案都试了,没有什么效果。最后在我运行的代码前,加 spark.stop() 后,问题解决。在Stack Overflow中找到了差不多的答案,估计是我在运行这段代码前已经运行了其他spark会话,运行结束没有关闭。。。就直接来运行这段code了。
spark.stop() # 就是它!
import findspark
findspark.init()
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
from pyspark.sql import SparkSession
spark = SparkSession \
.builder \
.appName("w2v pyspark") \
.master("local") \
.config("spark.executor.memory","4g") \
.config("spark.executor.cores","2") \
.config("spark.driver.memory","4g") \
.getOrCreate()
try:
df = spark.read.csv("./blog_articles_wordsegs.csv", header=True)
df.show(5)
except Exception as e:
logger.error(type(e))
logger.error(e)
参考:
Why does SparkContext randomly close, and how do you restart it from Zeppelin?
https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=read%20csv read_csv()