Spark 2.2.x 中文文档
使用 Spark Shell 进行交互式分析 > 自包含的(self-contained)应用程序。
SimpleApp.py
spark = SparkSession.builder().appName(appName).master(master).getOrCreate()
运行python SimpleApp.py报错:TypeError: ‘Builder’ object is not callable,修改为:
spark = SparkSession.builder.appName('appName').getOrCreate()
运行依然报错:py4j.protocol.Py4JError: org.apache.spark.api.python.PythonUtils.getEncryptionEnabled does not exist in the JVM。
此时安装findspark:
pip install findspark
并在SimpleApp.py开头加上:
import findspark
findspark.init()
完整代码:
import findspark
findspark.init()
from pyspark.sql import SparkSession
logFile = "/usr/local/spark-3.0.0-preview2-bin-hadoop2.7/README.md"
spark = SparkSession.builder.appName('appName').getOrCreate()
logData = spark.read.text(logFile).cache()
numA = logData.filter(logData.value.contains('a')).count()
numB = logData.filter(logData.value.contains('a')).count()
print("Lines with a: %i, lines with b: %i" % (numA, numB))
spark.stop()
运行 -> python SimpleApp.py