集群机器规划
三台机器的操作系统全部为CentOS7.2.主机名称 IP地址master 192.168.1.106slave1 192.168.1.107slave2 192.168.1.108其中master为hadoop的namenode,slave1和slave2为hadoop的datananode。如果安装spark的话,master、slave1和slave2都是worker。
环境准备
设置IP地址
在我们进入安装之前,首先要把服务器的网络、安全和登录等配置设置好,我们首先来设置IP地址。1.设置IP地址
nano /etc/sysconfig/network-scripts/ifcfg-eth0
#只列出一些比较重要的参数
ONBOOT=yes
BOOTPROTO=static
IPADDR=192.168.1.106
GATEWAY=192.168.1.1
DNS1=8.8.8.8
把所有的集群机器都设置一下,下面再修改HOST映射文件。2.设置HOST映射
nano /etc/hosts
192.168.1.106 master
192.168.1.107 slave1
192.168.1.108 slave2
#在master机器上,分别测试slave1和slave2的连通性
ping slave1 -c 3
ping slave2 -c 3
3.关闭防火墙和SELinux(主要是自己懒得设置端口)
systemctl stop firewalld.service
nano /etc/selinux/config
SELINUX=disable
配置运行环境
yum update openssl
nano /etc/ssh/sshd_config
RSAAuthentication yes #RSA加密
PubkeyAuthentication yes #使用口令验证
AuthorizedKeysFile .ssh/authorized_keys #生成的秘钥文件
#重启服务
systemctl restart sshd.service
1.增加Spark用户组和用户
groupadd -g 1000 spark
useradd -u 2000 -g spark spark
passwd spark
#赋予管理员的权限
chmod u+w /etc/sudoers
root ALL=(ALL) ALL
spark ALL=(ALL) ALL
#创建工作目录
```bash
mkdir /opt/spark
mkdir /opt/soft
chown -R spark:spark /opt/spark
chown -R spark:spark /opt/soft
2.设置集群服务器免认证在所有机器上生成一个rsa秘钥,如下:
ssh-keygen -t rsa
mv id_rsa.pub auth_keys_master.pub
#同样的再另外两台机器上,修改为
mv id_rsa.pub auth_keys_slave1.pub
mv id_rsa.pub auth_keys_slave2.pub
#将公钥信息合并并保存到authorized_keys
scp auth_keys_slave1.pub master:~/.ssh/
scp auth_keys_slave2.pub master:~/.ssh/
cat auth_keys_slave1.pub >> authorized_keys
cat auth_keys_slave2.pub >> authorized_keys
#将authorized_keys分发到各个节点
scp authorized_keys slave1:~/.ssh/
scp authorized_keys slave2:~/.ssh/
#最后分别设置集群的每台机器是对authorized_keys具有读写权限
chmod 400 authorized_keys
#看看能否成功
ssh master
ssh slave1
ssh slave2
3.安装JAVA我们使用的是CentOS64位的操作系统,对应在oracle jdk下载相应的安装包。
tar -zxf jdk-7u55-linux-x64.tar.gz
mv jdk1.7.x /opt/soft
#设置配置文件,并生效。
emacs /etc/profile or ~/.bash_profile
export JAVA_HOME=/opt/soft/jdk1.7.0_55
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
#使其生效
source /etc/profile or ~/.bash_profile
#验证java 是否安装成功
java -version
4.安装Scala由于SPARK安装需要scala2.11.x,所以在scala官网下载相应的scala源码进行配置。
tar -zxf scala-2.11.8.tgz
mv scala-2.11.8 /opt/soft
#配置SCALA环境设置
emacs /etc/profile
export SCALA_HOME=/opt/soft/scala-2.11.8
export PATH=$SCALA_HOME/bin:$PATH
#使其生效
source /etc/profile
#验证scala版本
scala -version
5.**安装第三方工具包
**
如果想对spark源码进行编译和扩展改动,那么需要对源码编译,那么需要安装一些第三方和开发类库。
#下载maven或者sbt
tar -zxf apache-maven-3.3.9-bin.tar.gz
mv maven-3.3.9 /opt/soft
export PATH=/opt/soft/maven-3.3.9/bin:$PATH
source /etc/profile
#下载protobuf
tar -zxf protobuf-2.5.0.tar.gz
mv protobuf-2.5.0 /opt/soft
cd /opt/soft/protobuf-2.5.0
./configure
make
make check
make install
#第三方工具包
yum install autoconf automake libtool cmake
yum install ncu
##rses-devel
yum install openssl-devel
yum install gcc*
HADOOP 安装和配置
直接去官网下载HADOOP2.7.2源码,下载到/opt/spark目录下。
安装
wget http://apache.stu.edu.tw/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz -P /opt/spark/
# 解壓縮檔案
tar -zxvf hadoop-2.7.2.tar.gz
配置
我们需要依次设置hadoop-env.sh、yarn-env.sh,slaves、core-site.xml、hdfs-site.xml、maprd-site.xml、yarn-site.xml,这些文件都在你下载的hadoop目录下(/opt/spark/hadoop-2.7.2/etc/hadoop)1.hadoop-env.sh配置JAVA或者SCALA环境变量
export JAVA_HOME=/opt/soft/jdk1.7.0_55
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export SCALA_HOME=/opt/soft/scala-2.11.8
2.在hadoop-env.sh中配置JAVA_HOME
export JAVA_HOME=/opt/soft/jdk1.7.0_55
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export SCALA_HOME=/opt/soft/scala-2.11.8
3.配置slaves
slave1
slave2
4.修改core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000/</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/opt/spark/hadoop-2.7.2/tmp</value>
</property>
</configuration>
5.修改hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:9001</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/opt/spark/hadoop-2.7.2/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/spark/hadoop-2.7.2/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
6.修改mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
7.修改yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8035</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
</configuration>
将配置好的hadoop分发给slave1和slave2
scp -r hadoop-2.7.2 slave1:/opt/spark/
scp -r hadoop-2.7.2 slave2:/opt/spark/
启动hadoop
在master启动hadoop
cd /opt/spark/hadoop-2.7.2
# 格式化namenode
bin/hadoop namenode -format
# 启动dfs
sbin/start-dfs.sh
# 启动yarn
sbin/start-yarn.sh
检查hadoop
# 检查hadoop是否启动成功
$ jps
980 NameNode
1239 SecondaryNameNode
1347 ResourceManager
6806 Jps
并且通过浏览器输入 http://master:8088,可以看到如下截图
SPARK 安装和配置
去官网下载spark,安装到/opt/spark目录里面。
SPARK安装
wget http://xxx.xxx.xxx/spark-2.0.0-bin-hadoop2.7.tgz -P /opt/spark
tar -zxvf spark-2.0.0-bin-hadoop2.7.tgz
mv spark-2.0.0-bin-hadoop2.7 spark-2.0.0
配置SPARK
cd /opt/spark/spark-2.0.0/conf
cp spark-env.sh.template spark-env.sh
emacs spark-env.sh
在末尾添加如下内容,
export JAVA_HOME=/opt/soft/jdk1.7.0_55
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export SCALA_HOME=/opt/soft/scala-2.11.8
export HADOOP_HOME=/opt/spark/hadoop-2.7.2
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
SPARK_MASTER_IP=master
SPARK_LOCAL_DIRS=/home/sparkß/spark-2.0.0/
配置slaves
master
slave1
slave2
启动 SPARK
sbin/start-all.sh
检查Spark 是否安裝成功
$ jps
980 NameNode
1239 SecondaryNameNode
1347 ResourceManager
6806 Jps
1889 Master
15021 Worker
#在 slave 上
$ jps
15021 Worker
14691 DataNode
14822 NodeManager
16866 Jps
Spark的Web管理頁面: http://master:8080