windows+pycharm搭建spark开发环境
- 创建python文件
- 点击 file >> setting >> Project:PythonProject >> Prohject Structure
- 添加pyspark.zip和py4j包到项目中(这两个文件在spark安装包的D:\apps\spark-2.3.2-bin-hadoop2.7\python\lib中)
- 添加pyspark.zip和py4j包到项目中(这两个文件在spark安装包的D:\apps\spark-2.3.2-bin-hadoop2.7\python\lib中)
- 编写代码测试:
from pyspark import SparkContext
sc = SparkContext(master='local[2]',appName='pyspark')
rdd = sc.parallelize([1, 2, 3, 4, 5, 6])
print(rdd.collect())
sc.stop()