Hadoop 2.5.1 虚拟集群搭建——Hadoop基本配置

上一篇博文 http://blog.csdn.net/stormragewang/article/details/41116213中介绍了虚拟机集群的基本环境的安装,当各个虚拟机设置好共享文件夹,安装好JAVA并且master结点可以通过ssh了解到slave1和slave2结点,这时候我们就可以进行Hadoop的安装工作了。
还是先把集群的规划贴出来
主机名别名OSIPHDFS角色Yarn角色
Yarn-MatsermasterCentOS_6.5_i686 minimal desktop192.168.137.100namenode,datanoderesourcemanager,nodemanager
Yarn-Slave_1slave1CentOS_6.5_i686 minimal192.168.137.101datanodenodemanager
Yarn-Slave_2slave2CentOS_6.5_i686 minimal192.168.137.102datanodenodemanager

首先去 http://hadoop.apache.org/releases.html#Download下载一个版本,我用的是2.5.1版本,放到共享文件夹中待用。

对于3个节点都进行以下操作:在Hadoop用户的用户目录下新建了个文件夹命名为Hadoop,并在下面新建3个文件夹:2.5.1、dfs、tmp,并在tmp下新建2个文件夹data、name。同时将下载的压缩文件解析到文件夹2.5.1中

hadoop的配置文件都在etc/hadoop文件下

其中有2个*.bak文件是我自己创建的,对于基本配置有4个文件要注意:core-site.xml、 hdfs-site.xml、 mapred-site.xml、 yarn-site.xml、 slaves 这5个文件,详细的信息可以参考官方文档 http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-common/ClusterSetup.html

先配置master节点,只需要配置core-site.xml、 hdfs-site.xml、 slaves 这3个文件
core-site.xml 文件如下
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
	<property>
		<name>fs.defaultFS</name>
		<value>hdfs://master</value>
	</property>
	<property>
  		<name>hadoop.tmp.dir</name>
  		<value>file:/home/Hadoop/tmp</value>
 		<description>A base for other temporary directories.</description>
 	</property>
</configuration>
hdfs-core.xml文件如下
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
	<property>
		<name>dfs.datanode.data.dir</name>
		<value>file:/home/Hadoop/Hadoop/dfs/data</value>
		<description>Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks.</description>
	</property>
	<property>
		<name>dfs.namenode.name.dir</name>
		<value>file:/home/Hadoop/Hadoop/dfs/name</value>
		<description>Path on the local filesystem where the NameNode stores the namespace and transaction logs persistently.</description>
	</property>
	<property>
		<name>dfs.replication</name>
		<value>2</value>
       </property> 
</configuration>
slaves文件如下:
master
slave1
slave2


然后配置slave1和slave2节点,只需要配置core-site.xml、 hdfs-site.xml、 yarn-site.xml这3个文件
core-site.xml文件如下
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
	<property>
		<name>fs.defaultFS</name>
		<value>hdfs://master</value>
	</property>
	<property>
  		<name>hadoop.tmp.dir</name>
  		<value>file:/home/Hadoop/tmp</value>
 		<description>A base for other temporary directories.</description>
 	</property>
</configuration>
hdfs-site.xml文件如下
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
	<property>
		<name>dfs.datanode.data.dir</name>
		<value>file:/home/Hadoop/Hadoop/dfs/data</value>
		<description>Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks.</description>
	</property>
	<property>
		<name>dfs.replication</name>
		<value>2</value>
       </property> 
</configuration>
yarn-site.xml文件如下
<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>
	<property>
		<name>yarn.nodemanager.aux-services</name>
		<value>mapreduce_shuffle</value>
	</property>
	<property>
		<name>yarn.resourcemanager.hostname</name>
		<value>master</value>
	</property>
	<property>
		<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
		<value>org.apache.hadoop.mapred.ShuffleHandler</value>
	</property>   
</configuration>


然后配置hadoop环境参数,对于3个节点都进行以下操作:
先编辑.bash_profile如下:
# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
	. ~/.bashrc
fi

# User specific environment and startup programs

PATH=$PATH:$HOME/bin

export PATH

# make all configuration files in $HOME/.env_vars works
for i in $HOME/.env_vars/*.sh ; do
	if [ -r "$i" ]; then
		. "$i"
	fi
done
该脚本会自动读取所用.env_vars文件夹下的所有*.sh文件,把环境变量配置到bash中,在用户目录下新建一个.env_vars文件夹,然后在.env_vars文件夹中新建一个hadoop_2.5.1.sh文件,其内容如下:
# setup environment variables of hadoop_2.5.1
#

export HADOOP_PREFIX=$HOME/Hadoop/2.5.1

export HADOOP_HOME=$HADOOP_PREFIX
export HADOOP_COMMON_HOME=$HADOOP_PREFIX
export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
export HADOOP_HDFS_HOME=$HADOOP_PREFIX
export HADOOP_MAPRED_HOME=$HADOOP_PREFIX
export HADOOP_YARN_HOME=$HADOOP_PREFIX
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"


启动分布式集群:
首先在master节点关闭防火墙
[Hadoop@Yarn-Master ~]$ su
Password: 
[root@Yarn-Master Hadoop]# service iptables stop
iptables: Setting chains to policy ACCEPT: filter          [  OK  ]
iptables: Flushing firewall rules:                         [  OK  ]
iptables: Unloading modules:                               [  OK  ]
[root@Yarn-Master Hadoop]# 
然后在mster结点格式化分布式文件系统:$HADOOP_HOME/bin/hdfs namenode -format
然后在$HADOOP_HOME/sbin目录下依次启动HDFS和YARN,先启动dfs
[Hadoop@Yarn-Master ~]$ cd Hadoop/2.5.1/sbin/
[Hadoop@Yarn-Master sbin]$ ls
distribute-exclude.sh    start-all.cmd        stop-all.sh
hadoop-daemon.sh         start-all.sh         stop-balancer.sh
hadoop-daemons.sh        start-balancer.sh    stop-dfs.cmd
hdfs-config.cmd          start-dfs.cmd        stop-dfs.sh
hdfs-config.sh           start-dfs.sh         stop-secure-dns.sh
httpfs.sh                start-secure-dns.sh  stop-yarn.cmd
mr-jobhistory-daemon.sh  start-yarn.cmd       stop-yarn.sh
refresh-namenodes.sh     start-yarn.sh        yarn-daemon.sh
slaves.sh                stop-all.cmd         yarn-daemons.sh
[Hadoop@Yarn-Master sbin]$ ./start-dfs.sh 
14/11/15 20:41:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [master]
master: starting namenode, logging to /home/Hadoop/Hadoop/2.5.1/logs/hadoop-Hadoop-namenode-Yarn-Master.out
slave2: starting datanode, logging to /home/Hadoop/Hadoop/2.5.1/logs/hadoop-Hadoop-datanode-Yarn-Slave_2.out
slave1: starting datanode, logging to /home/Hadoop/Hadoop/2.5.1/logs/hadoop-Hadoop-datanode-Yarn-Slave_1.out
master: starting datanode, logging to /home/Hadoop/Hadoop/2.5.1/logs/hadoop-Hadoop-datanode-Yarn-Master.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/Hadoop/Hadoop/2.5.1/logs/hadoop-Hadoop-secondarynamenode-Yarn-Master.out
14/11/15 20:42:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[Hadoop@Yarn-Master sbin]$ 
可以在浏览器中查看结点状态

再启动YARN
[Hadoop@Yarn-Master sbin]$ ./start-yarn.sh 
starting yarn daemons
starting resourcemanager, logging to /home/Hadoop/Hadoop/2.5.1/logs/yarn-Hadoop-resourcemanager-Yarn-Master.out
slave2: starting nodemanager, logging to /home/Hadoop/Hadoop/2.5.1/logs/yarn-Hadoop-nodemanager-Yarn-Slave_2.out
slave1: starting nodemanager, logging to /home/Hadoop/Hadoop/2.5.1/logs/yarn-Hadoop-nodemanager-Yarn-Slave_1.out
master: starting nodemanager, logging to /home/Hadoop/Hadoop/2.5.1/logs/yarn-Hadoop-nodemanager-Yarn-Master.out
[Hadoop@Yarn-Master sbin]$ 
也可以在浏览器中查看结点状态

这样整个集群就启动了!

如果启动过程中有问题,在$HADOOP_HOME/log中有日志文件,根据具体出现问题去google就基本上都可以解决,后续的工作就是优化参数和跑跑程序了。











评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值