环境搭建
hadoop安装(单节点)
版本:2.7.7
https://cloud.tencent.com/developer/article/1348631
scala及spark安装
scala:2.12.8
spark:2.4.4-hadoop2.7
https://www.jianshu.com/p/8c0b1b39d0e5
使用pyspark-shell时,添加环境变量
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS=“notebook --ip=192.168.1.69”
依赖库安装(以matplotlib为例)
pip3 install matplotlib -i http://mirrors.aliyun.com/pypi/simple --trusted-host mirrors.aliyun.com
总结(思维导图)
ps: 需要原始思维导图的点击