关于在notebook中报错的一个问题的解决
报错如下:
NameError Traceback (most recent call last)
<ipython-input-1-3f07a3d84249> in <module>
3 import pyspark
4
----> 5 conf=SparkConf().setAppName("wordcount")
6 sc=SparkContext(conf=conf)
7 text_file=sc.textFile("file:/home/tyy/桌面/test.txt")
NameError: name 'SparkConf' is not defined
如果你安装了findspark包,那么解决方案如下:
import findspark
findspark.init()
import pyspark
from pyspark import SparkContext, SparkConf
conf=SparkConf().setAppName("wordcount")
sc=SparkContext(conf=conf)
text_file=sc.textFile("file:/home/tyy/桌面/test.txt")
counts=text_file.flatMap(lambda line:line.split(" "))\
.map(lambda word:(word,1))\
.reduceByKey(lambda a,b:a+b)
print(counts.collect())
sc.stop()
等待几秒后结果就会出来
结果如下: