python=3.6.8
1.conda install pyspark
2.安装spark低版本(spark-2.4.8-bin-hadoop2.7)
3.配置spark环境(SPARK_HOME/bin)
4.安装java8(环境配置JAVA_HOME/bin)
5.官网安装hadoop(hadoop-3.2.3)
6.github下载对应hadoop版本的winutils.exe黏贴到hadoop的bin目录(3.2.1)
7.打开anaconda虚拟环境,直接输入pyspark命令检查是否报错
8.安装jupyter notebook
9.conda安装findspark
10.重启jupyter的kernel,必须重启。导入pyspark前加两行代码即可
import findspark
findspark.init()
import pyspark
from pyspark.sql import *
from pyspark.sql.functions import *
from pyspark.sql.types import IntegerType, FloatType
from pyspark.sql import functions as F
from pyspark import SparkContext, SparkConf