01_windows10配置spark与pyspark
02_jupyterLab_windows设置pyspark
03_jupyternotebook_windows设置pyspark
1.配置环境变量
pyspark是在anaconda里面安装的,默认前面spark与hadoop,java路径已经配置,这里只需要配置pyspark
2.安装findspark
pip install findspark
3.调用pyspark
创建notebook并findspark
在新建的notebook里面运行以下代码。
import findspark
findspark.init('D:\\spark-3.1.3-bin-hadoop3.2')
findspark.find()
'D:\\spark-3.1.3-bin-hadoop3.2'
import pyspark
sc = pyspark.SparkContext()
tempData = [59,57.2,53.6,55.4,51.8,53.6,55.4]
# 这边就可以使用sc调用parallelize方法了
parTempData = sc.parallelize(tempData)
parTempData.collect()
[59, 57.2, 53.6, 55.4, 51.8, 53.6, 55.4]
参考引用:
1.https://blog.csdn.net/sora_xu/article/details/104271734