- 安装spark
将下载好的spark上传到Ubuntu,并解压:
sudo tar -zxf /home/hadoop/spark-2.4.0-bin-without-hadoop.tgz -C /usr/lib/spark cd /usr/lib sudo mv ./spark-2.4.0-bin-without-hadoop/ ./spark sudo chown -R hadoop:hadoop ./spark |
在 ./conf/spark-env.sh 中修改 Spark 的 Classpath,执行如下命令拷贝一个配置文件:
cd /usr/lib/spark cp ./conf/spark-env.sh.template ./conf/spark-env.sh |
编辑 ./conf/spark-env.sh(sudo vim ./conf/spark-env.sh) ,在最后面加上如下:
JAVA_HOME=/usr/lib/jvm/jdk1.8.0_191 SPARK_WORKER_MEMORY=4g |
修改环境变量:
sudo vim ~/.bashrc |
添加如下:
export SPARK_HOME=/usr/lib/spark export PATH=${SPARK_HOME}/bin:${SPARK_HOME}/sbin:$PATH |
最后输入:
source ~/.bashrc |
输入以下命令查看spark:
cd usr/lib/spark/conf spark-shell |
测试spark是否安装成功:
usr/lib/spark/bin run-example SparkPi 2>&1 | grep 'Pi is' |
- 运行
2.1 查看new_student1
new_student1.show |
2.2 求成绩降序:
new_student.sort(student("score").desc).show() |
2.3 求平均成绩:
new_student.agg(mean("score")).show() |