组件版本
组件 | 版本 | 下载地址 |
---|---|---|
Hadoop | 2.7.7 | hadoop 2.7.7 |
JDK | 1.8 | jdk 8 |
Mysql | 5.7 | Mysql 5.7 |
Hive | 2.3.4 | Hive 2.3.4 |
Spark | 2.1.1 | Spark 2.1.1 |
**机器环境 **
IP | 主机名 | 密码 |
---|---|---|
192.168.222.201 | master | password |
192.168.222.202 | slave1 | password |
192.169.222.203 | slave2 | password |
1、机器基础环境
参考地址:https://blog.csdn.net/su_mingyang/article/details/118070573
- 关闭防火墙,设置开机不自启(三台虚拟机都要做该操作)
- 配置hosts文件(三天能够互相通信)
- 配置SSH
- 时间同步配置NTP或使用date手动调整
2、安装java(三台机器都要安装)
参考地址:https://blog.csdn.net/su_mingyang/article/details/120872313
3、安装hadoop 2.7.7 完全分布式
参考地址:https://blog.csdn.net/su_mingyang/article/details/120872850
4、搭建Spark 完全分布式
4.1 、解压spark文件
[root@master ~]#
tar -xzvf /chinaskills/spark-2.1.1-bin-hadoop2.7.tgz -C /usr/local/src/
4.2、重命名
[root@master ~]#
mv /usr/local/src/spark-2.1.1-bin-hadoop2.7 /usr/local/src/spark
4.3、配置spark 环境变量
[root@master ~]#
vi /root/.bash_profile
配置内容:
export SPARK_HOME=/usr/local/src/spark
export PATH=$PATH:$SPARK_HOME/sbin:$SPARK_HOME/bin
4.4 加载环境变量
source /root/.bash_profile
4.5 配置spark
[root@master ~]#
cp /usr/local/src/spark/conf/spark-env.sh.template /usr/local/src/spark/conf/spark-env.sh
vi /usr/local/src/spark/conf/spark-env.sh
配置内容:
# java位置
export JAVA_HOME=/usr/local/src/java
# master节点IP或域名
export SPARK_MASTER_IP=master
# worker内存大小
export SPARK_WORKER_MEMORY=1G
# Worker的cpu核数
SPARK_WORKER_CORES=1
# hadoop配置文件路径
export HADOOP_CONF_DIR=/usr/local/src/hadoop/etc/hadoop
4.6 配置slaves
[root@master ~]#
cp /usr/local/src/spark/conf/slaves.template /usr/local/src/spark/conf/slaves
vi /usr/local/src/spark/conf/slaves.template
配置内容:
master
slave1
slave2
4.7 分发文件给slave1和slave2
scp -r /usr/local/src/spark slave1:/usr/local/src/
scp -r /usr/local/src/spark slave2:/usr/local/src/
scp /root/.bash_profile slave1:/root/
scp /root/.bash_profile slave2:/root/
4.8 启动Spark 集群
/usr/local/src/spark/sbin/start-all.sh
输出信息:
starting org.apache.spark.deploy.master.Master, logging to /usr/local/src/spark/logs/spark-root-org.apache.spark.deploy.master.Master-1-master.out
slave2: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/src/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave2.out
slave1: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/src/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave1.out
master: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/src/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-master.out
4.9 web访问
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-iEDbSa7G-1634734160666)(image-20211020204026393.png)]