0 机器分配
IP host 角色
172.29.41.153 master Spark master
172.29.41.154 slave1 Spark slave
172.29.41.155 slave2 Spark slave
1、安装scala
(2.10.6支持java7\java6 2.12.*只支持java8以上)
sudo tar -zxvf scala-2.10.6.tgz -C /usr/local
cd /usr/local
sudo mv scala-2.10.6 scala
sudo chown -R hadoop:hadoop scala
2、验证scala是否安装成功
sudo vi ~/.bashrc
export SCALA_HOME=/usr/local/scala
export PATH = $PATH:$SCALA_HOME/bin
source ~/.bashrc
scala -version
3、解压spark-2.10
sudo tar -zxvf spark-2.1.0-bin-hadoop2.7.tgz -C /usr/local
cd /usr/local
sudo mv sspark-2.1.0-bin-hadoop2.7 spark
sudo chown -R hadoop:hadoop spark
4、配置spark
- 进入spark目录,拷贝Spark环境模板文件为环境文件:
cp conf/spark-env.sh.template conf/spark-env.sh
,然后添加如下内容:
export SCALA_HOME=/root/dev/java/scala-2.12.1
export SPARK_WORKER_MEMORY=1g
export SPARK_MASTER_IP=your_server_ip
export MASTER=spark://your_server_ip:7077
# 如果SSH端口不是缺省的22时加入下面行
export SPARK_SSH_OPTS="-p 22000"
生成Slave文件:
cp conf/slaves.template conf/slaves
。在这个文件中加入Worker节点的名称
5、配置spark环境变量
sudo vi ~/.bashrc
export SPARK_HOME=/usr/local/spark
export PATH = $PATH:$SPARK_HOME/bin
source ~/.bashrc
6、发往slave1\slave2
scp -r /usr/local/spark slave1:~
scp -r /usr/local/spark slave2:~
7、启动spark
- 进入$SPARK_HOME目录,启动Spark:./sbin/start-all.sh
- 进入到$SPARK_HOME目录,运行求PI的实例:
./bin/run-example org.apache.spark.examples.SparkPi
- 运行spark-shell
./bin/spark-shell
8、spark-shell WordCount
- 在hdfs下建立文件/sparkTest/aaa 内容如下
- 进入sparkshell后运行wordcount :
scala> val file=sc.textFile("hdfs://master:9000/sparkTest/aaa")
scala> val count=file.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_)
scala> count.collect()
至此spark环境已经成功安装到了集群