- 下载安装jupyter notebook,并配置,详见另一篇博客,jupyter notebook配置
- 安装python包 findspark、pyspark。
- 用findspark.init函数配置集群中spark-client2路径,python路径。
import findspark
findspark.init(spark_home="/usr/hdp/current/spark2-client/",python_path="/usr/bin/python3")
from pyspark import SparkConf,SparkContext,SQLContext
- 导入sparkcontext
conf = SparkConf().setMaster("yarn").setAppName("http")