一、分布式集群:把一些机器连接起来(通过介质连接起来并能够通信),让它们共同完成一项任务(存储任务和计算任务),集群完成任务的思想:分而治之,汇总结果。
二、软件准备:CentOS6.5+VMWare11+JDK1.7.0_79+Hadoop-2.4.1
三、准备工作:1.准备四台机器或四台虚拟机,名称分别为:
2.安装Centos6.5系统并配置(simple03)simple03(NameNode,ResourceManager)
simple04(SecondaryNameNode,DataNode、nodemanager)
simple05(DataNode、nodemanager)
simple06(DataNode、nodemanager)
网卡信息、主机名 、 映射配置 、 ssh安装和配置(免密码配置)、JDK安装配置、环境配置(/etc/profile)
集群的免密码配置:
在四个节点中分别执行命令产生公钥和秘钥ssh-keygen -t ras ;连续四个回车,生成2个文件: id_rsa和 id_rsa.pub;
ssh-copy-id localhost 作用:就是把id_rsa.pub中的内容拷贝到authorized_keys使用 cat id_rsa.pub >> authorized_keys 可以代替ssh-copy-id localhost这个命令;
对于四台机器,可以把每个节点上的ssh公共密钥的内容放到一个权限文件中即可。在simple中配置完authorized_keys中的所有节点公钥内容之后,分别拷贝到其他的节点,命令如下:
四.克隆3个系统(simple04,simple05,simple06)#在simple03节点上
scp /root/.ssh/authorized_keys simple04:/root/.ssh/
scp /root/.ssh/authorized_keys simple05:/root/.ssh/
scp /root/.ssh/authorized_keys simple06:/root/.ssh/
集群环境搭建#hadoop-en.sh
在simple03上安装hadoop并配置,配置完毕之后,远程拷贝到其他节点即可。在simple03上配置Hadoop环境:
export JAVA_HOME=/simple/jdk1.7.0_79
--------------------------------------------------------------------------------------------------
#core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.0.203:9000</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://simple03:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/simple/hadoop-2.4.1/tmp</value>
<description>store tmp file</description>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property></configuration>
-------------------------------------------------------------------------------------------------
#hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.http-address</name>
<value>simple03:50070</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>simple04:50090</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/simple/hadoop-2.4.1/hdfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/simple/hadoop-2.4.1/hdfs/data</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>/simple/hadoop-2.4.1/hdfs/namesecondary</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.stream-buffer-size</name>
<value>131072</value>
</property>
</configuration>
-------------------------------------------------------------------------------------------------
#mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>simple03:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>simple03:19888</value>
</property></configuration>
------------------------------------------------------------------------------------------------
#yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>simple03</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>simple03:8032</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>8192</value>
</property>
</configuration>
-----------------------------------------------------------------------------------------------
#slaves
simple04
simple05
simple06
在simple03上配置完hadoop之后,需要把hadoop的安装目录远程拷贝到其他节点:
scp -R hadoop-2.4.1/ simple04:/simple/
scp -R hadoop-2.4.1/ simple05:/simple/
scp -R hadoop-2.4.1/ simple06:/simple/
之后再格式化namenode:hdfs -format namenode
在simple03上启动Hadoop集群
sbin/start-dfs.sh
sbin/start-yarn.sh
最后jps查看
simple03(NameNode,ResourceManager)
simple04(SecondaryNameNode,DataNode、nodemanager)
simple05(DataNode、nodemanager)
simple06(DataNode、nodemanager)