1、安装Scala
a 下载地址:http://www.scala-lang.org/download/
我选择安装scala-2.12.0.tgz 最新版本。
b 将压缩上传至/usr/local 目录
c 解压tar -zxvf scala-2.12.0.tgz
d 变更软连接
ln -s scala-2.12.0 scala
e 修改配置文件安装
vim /etc/profile
//#add by lekko
export SCALA_HOME=/usr/local/scala
export PATH=
PATH:
SCALA_HOME/bin
f 配置完毕后 让其生效
source /etc/profile
g 可以查看安装好的Scala版本号是否OK,能执行表示已经安装成功
scala -version
2、Spark安装与配置
a 下载: http://spark.apache.org/downloads.html
中选择最新版本 2.02
b 将压缩上传至/usr/local 目录
c 解压tar -zxvf spark-2.0.2-bin-hadoop2.7.tgz
d 变更软连接
ln -s spark-2.0.2-bin-hadoop2.7 spark
e 修改配置文件安装
vim /etc/profile
//#add by lekko
export SPARK_HOME=/usr/local/spark
export PATH=
PATH:
SPARK_HOME/bin:$SPARK_HOME/sbin
f 配置完毕后 让其生效
source /etc/profile
g测试环境变量设置是否OK,能执行表示已经安装成功
spark-shell –version
h配置Spark
修改spark-env.sh
cd /usr/local/spark/conf/
cp spark-env.sh.template spark-env.sh
vim spark-env.sh
//#追加如下内容
export SCALA_HOME=/usr/local/scala
export JAVA_HOME=/usr/local/jdkaddress/xxxx
export SPARK_MASTER_IP=192.168.XXX.XXX
export SPARK_WORKER_MEMORY=1024m
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
I相关启动停止命令
启动Spark
start-all.sh
推荐使用
start-dfs.sh and start-yarn.sh
停止命令
stop-all.sh
推荐使用
stop-dfs.sh and stop-yarn.sh
如果看到:
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [master]
master: starting namenode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-root-namenode-master.out
121.199.7.226: starting datanode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-root-datanode-slave1.out
121.199.51.129: starting datanode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-root-datanode-slave2.out
121.199.51.174: starting datanode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-root-datanode-slave3.out
Starting secondary namenodes [master]
master: starting secondarynamenode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-root-secondarynamenode-master.out
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/hadoop-2.7.3/logs/yarn-root-resourcemanager-master.out
121.199.51.129: starting nodemanager, logging to /home/hadoop/hadoop-2.7.3/logs/yarn-root-nodemanager-slave2.out
121.199.7.226: starting nodemanager, logging to /home/hadoop/hadoop-2.7.3/logs/yarn-root-nodemanager-slave1.out
121.199.51.174: starting nodemanager, logging to /home/hadoop/hadoop-2.7.3/logs/yarn-root-nodemanager-slave3.out
可以知道,表示启动hadoop及 spark成功
J提交任务到Spark集群
spark-submit –master spark://192.XXX.XXX.XXX:7077 –class 主函数入口 –name 自己起个名称 jar包的全路径
例:spark-submit –master spark://192.XXX.XXX.XXX:7077 –class cn.XXXX.XXXXXXXXX.TFIDF –name XXXX XXXX.jar
K提交任务到yarn中
spark-submit –master yarn-cluster –class cn.XXXX.XXXXXXXXX.TFIDF –name XXXX XXXX.jar
通过 http://192.XXX.XXX.XXX:8088/ 查看状态