Spark版本:3.5.0,下载:Index of /dist/spark/spark-3.5.0
运行模式:Local本地运行模式
Yarn运行模式:Hadoop Yarn运行模式 WordCount例子(JAVA)-CSDN博客
1 解压,环境变量
# tar -xzvf spark-3.5.0-bin-hadoop3-scala2.13.tgz
# vi .bashrc
export SPARK_HOME=/home/hadoop/spark-3.5.0-bin-hadoop3-scala2.13
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
# source .bashrc
2 测试
1) spark-submit提交任务
--master local是本地运行模式
# spark-submit --class org.apache.spark.examples.SparkPi --master local[2] /home/hadoop/spark-3.5.0-bin-hadoop3-scala2.13/examples/jars/spark-examples_2.13-3.5.0.jar 10 # spark-examples_2.13-3.5.0.jar是spark自带的example包
----master yarn
# 环境变量
# vi .bashrc
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
# source .bashrc
# spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client --driver-memory 1g --executor-memory 1g --executor-cores 2 --queue default /home/test/spark-3.5.0-bin-hadoop3-scala2.13/examples/jars/spark-examples_2.13-3.5.0.jar 10
2) spark-shell,单词计数
本地文件
# cat words.txt
hello world
hello hi
hi tom
# spar-shell
scala>sc.textFile("/home/hadoop/words.txt").flatMap(_.split(" ")).map((_, 1)).reduceByKey(_ + _).collect
HDFS文件
Hadoop版本:3.3.5,安装:腾讯云服务器Linux centos hadoop安装配置-CSDN博客
# start-dfs.sh # 安装了Hadoop,启动HDFS
# hdfs dfs -put /home/hadoop/words.txt / # 上传文件到HDFS的根目录
# spark-shell
sc.textFile("hdfs://localhost:9000/words.txt").flatMap(_.split(" ")).map((_, 1)).reduceByKey(_ + _).collect