1.1准备环境
1.1所需软件包
a、CentOS-6.5-i386-LiveDVD.iso 32位centos系统
b、jdk-7u71-linux-i586.rpm 32位jdk-Linux rpm文件
c、hadoop-2.2.0.tar.gz hadoop安装包
d、zookeeper-3.4.6.tar.gz zookeeper安装包
1.2运行环境
a、VMware9.0及以上版本虚拟机工具
b、创建三个虚拟机节点
c、网卡模式设置为桥接模式
1.3节点信息
Hadoop1 192.168.120.191 namenode,resourcemanagersparkMaster
Hadoop2 192.168.120.192 namenode,resourcemanagersparkWorker
Hadoop3 192.168.120.193 datanode,nodemanagersparkWorker
2具体配置
2.1系统配置
安装三个Centos6.5 32位系统作为三个节点
2.1.1配置网络
#vim/etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE="eth0"
BOOTPROTO="static"
IPV6INIT="yes"
NM_CONTROLLED="yes"
ONBOOT="yes"
TYPE="Ethernet"
IPADDR=192.168.120.191
NETMASK=255.255.255.0
GATEWAY=192.168.120.1
DNS1=210.31.249.20
2.1.2修改hostname和hosts文件
#vim/etc/sysconfig/network
NETWORKING=yes
HOSTNAME=hadoop1
#vim /etc/hosts
192.168.120.191hadoop1
192.168.120.192hadoop2
192.168.120.193hadoop3
2.1.3关闭iptables和selinux
#serviceiptables stop
#chkconfigiptables off
#vim/etc/selinux/config
...
SELINUX=disabled
...
将另外两台虚机做上述同样的操作,只需将相应的IP地址和hostname做相应的修改即可,然后将三个节点都重启,执行命令 #init 0 或 #reboot
2.1.4 SSH无密钥登录
三个节点都执行
#ssh-keygen -t dsa -P ‘’ -f ~/.ssh/id_dsa
#cp /root/.ssh/id_dsa.pub/root/.ssh/authorized_keys
分别将三个节点/root/.ssh/id_dsa.pub中的内容都拷贝到各个节点的authorized_keys文件中
以上是hadoop1的authorized_keys中的内容,hadoop2和hadoop3节点也须保持一致,保存退出后,检验是否生效:
Hadoop1节点:
#ssh hadoop2 //第一次登陆会有提示,输入yes
#ssh hadoop3
Hadoop2节点:
Hadoop3节点:
2.1.5 JDK的安装
到http://www.oracle.com/technetwork/java/javase/downloads/index.html链接中下载jdk-7u71-linux-i586.rpm文件,将下载的文件拷贝到三个节点的/root目录下,安装rpm文件执行命令:
#rpm -ivhjdk-7u71-linux-i586.rpm //默认安装路径(/usr/java),其中latest是软链接文件
2.1.6 scala的安装
将scala-2.10.3.tgz安装包拷贝到/opt/spark目录下,安装scala
#tar -zxvf scala-2.10.3.tgz
#mv scala-2.10.3 scala
修改环境变量
#vim /root/.bashrc
export SCALA_HOME=/opt/spark/scala
export PATH=$PATH:/root/hadoop/sbin:/root/hadoop/bin:$JAVA_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin
#source .bashrc
2.1.7 更新源
到http://mirrors.163.com/.help/centos.html链接中下载Centos6的repo源文件CentOS6-Base-163.repo,将其拷贝到/root目录下(三个节点均执行)
#rm -rf /etc/yum.repos.d/*
#cp /root/CentOS6-Base-163.repo/etc/yum.repos.d/
#yum makecache
2.2集群配置
2.2.1安装spark1.0.2
将spark-1.0.2-bin-hadoop2.tgz安装包拷贝到/opt/spark目录下,安装spark
#tar -zxvf spark-1.0.2-bin-hadoop2.tgz
mv spark-1.0.2-bin-hadoop2 spark
修改环境变量
#vim /root/.bashrc
...
export SPARK_HOME=/opt/spark/spark
exportPATH=$PATH:/root/hadoop/sbin:/root/hadoop/bin:$JAVA_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin
#source .bashrc
2.2.2配置spark-env.sh
#vim /opt/spark/spark/conf/spark-env.sh
export SCALA_HOME=/opt/spark/scala
export JAVA_HOME=/usr/java/latest
export SPARK_MASTER_IP=hadoop1
export SPARK_WORKER_CORES=1
export SPARK_WORKER_MEMORY=512m
export SPARK_WORKER_PORT=8888
exportSPARK_WORKER_INSTANCES=1
2.2.3配置slaves
#vim opt/spark/spark/conf/slaves
hadoop1
hadoop2
hadoop3
2.2.4 同步数据文件
Hadoop1节点中执行:
复制hadoop1节点环境变量到hadoop2节点
#scp /root/.bashrc hadoop2:/root /
复制hadoop1节点环境变量到hadoop3节点
#scp /root/.bashrc hadoop3:/root /
将scala和spark同步到其他节点:
#scp –r /opt/spark/* hadoop2:/opt/spark/
#scp -r /opt/spark/* hadoop3:/opt/spark/
Hadoop2节点执行:
#cd /root
#source .bashrc
Hadoop3节点:
#cd /root
#source .bashrc
2.2.6启动集群
启动spark命令:start-all.sh