spark版本spark-2.1.1-bin-hadoop2.7
jdk1.8
python 3.6
参考:http://www.jianshu.com/p/5701591bfc70
测试代码:
from pyspark import SparkContext, SparkConf
logFile = 'C:\\Python\\Python36\\Lib\\site-packages\\README.txt'
conf = SparkConf().setMaster("local[*]").setAppName("Fisrt")
sc = SparkContext(conf=conf)
logData = sc.textFile(logFile).cache()
numAs = logData.filter(lambda s: 'a' in s).count()
numBs = logData.filter(lambda s: 'b' in s).count()
print('Lines with a :%i, lines with b: %i' % (numAs, numBs))