说明:集群环境(使用vmware搭建了三台相同的虚拟机):
192.168.111.128(主)CentOS6.4(64位)
192.168.111.129(从)CentOS6.4(64位)
192.168.111.130(从)CentOS6.4(64位)
一、修改/etc/hosts文件
修改/etc/hosts 文件,增加三台机器的ip和hostname(/etc/sysconfig/HOSTNAME)的映射关系
192.168.111.128(主) vm01
192.168.111.129(从) vm02
192.168.111.130(从) vm03
二、配置ssh无密码登录。
2.1、在主服务器master下生成密钥
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
2.2、将密钥放在 ~/.ssh/authorized_keys
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
2.3、对生成的文件authorized_keys授权
chmod 600 ~/.ssh/authorized_keys
2.4、编辑sshd_config文件,将下面三列#去掉
vim /etc/ssh/sshd_config
RSAAuthentication yes
PubkeyAuthentication yes
AuthorizedKeysFile .ssh/authorized_keys
2.5、重启sshd 服务(可省略)
service sshd restart
2.6、验证无验证登陆
ssh localhost
2.7、 配置master无密钥登陆slave(以此可以配置master无密钥登陆slaveX)
注释:root为root用户,master为主机名
[root@slaver ~]# scp -r root@master :/root/.ssh/id_dsa.pub /root/.ssh/slaver.pub
[root@slaver ~]# cat ~/.ssh/slaver.pub >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
vim /etc/ssh/sshd_config
RSAAuthentication yes
PubkeyAuthentication yes
AuthorizedKeysFile .ssh/authorized_keys
service sshd restart
三、jdk安装
3.1、下载
这个非常简单,去官网下载即可 http://www.oracle.com/technetwork/java/javase/downloads/index.html
3.2、在/etc/profile中加入相关配置
export JAVA_HOME=/usr/jdk1.8.0_101
PATH=/usr/jdk1.8.0_101/bin:$STORM_HOME/bin:$MAVEN_HOME/bin:$PATH
CLASSPATH=.:/usr/jdk1.8.0_101/jre/lib:/usr/jdk1.8.0_101/lib
source /etc/profile 让环境变量生效
export MAVEN_HOME=/home/apache-maven-3.3.9
export STORM_HOME=/home/storm-0.9.1
export PATH USER LOGNAME MAIL HOSTNAME HISTSIZE HISTCONTROL
export JAVA_HOME=/usr/jdk1.8.0_101
export HADOOP_HOME=/home/hadoop2.2.0/hadoop-2.2.0
export PAHT=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_LOG_DIR=/home/hadoop2.2.0/hadoop-2.2.0/logs
export YARN_LOG_DIR=$HADOOP_LOG_DIR
PATH=/usr/jdk1.8.0_101/bin:$STORM_HOME/bin:$MAVEN_HOME/bin:$PATH
CLASSPATH=.:/usr/jdk1.8.0_101/jre/lib:/usr/jdk1.8.0_101/lib
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib"
2、进入/home/hadoop2.2.0/hadoop-2.2.0/etc/hadoop下修改配置文件
core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.111.128:9003</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/hadoop2.2.0/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
</configuration>
3、 hdfs-site.xml:
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>vm01:9001</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop2.2.0/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///home/hadoop2.2.0/dfs/data,/hdfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
4、hadoop-env.sh添加
export JAVA_HOME=/usr/jdk1.8.0_101
5、yarn-site.xml:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>vm01:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>vm01:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>vm01:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>vm01:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>vm01:8088</value>
</property>
<!-- Site specific YARN configuration properties -->
</configuration>
6、mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>vm01:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>vm01:19888</value>
</property>
</configuration>
7、slaves:
vm02
vm03
8、将vm01上配置拷贝到vm02和vm03上
scp -r core-site.xml/ root@vm02:/home/hadoop2.2.0/hadoop-2.2.0/etc/hadoop/
scp -r core-site.xml/ root@vm03:/home/hadoop2.2.0/hadoop-2.2.0/etc/hadoop/
9、启动集群
格式化namenode:./bin/hdfs namenode –format
启动hdfs: ./sbin/start-dfs.sh
jps查看下进程,vm01上有:
启动yarn: ./sbin/start-yarn.sh
此时在vm01上面运行的进程有:namenodesecondarynamenoderesourcemanager
vm02和vm03上面运行的进程有:datanodenodemanager
五、我遇到的错误:
java.net.ConnectException: Call From vm01/192.168.111.128 to vm01:9000 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
解决方法:
<property>
<name>fs.defaultFS</name>
<value>hdfs:/vm01:9000</value>
</property>
改为
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.111.128:9003</value>
</property>
可能是9000端口引起了冲突。
解决方法:
为集群中的机器设置时钟同步,我就是date了下主节点时间,然后用date -s命令同步了各节点时间。