Hadoop 2.5.1 虚拟集群搭建——Hadoop基本配置

最新推荐文章于 2022-08-17 08:47:44 发布

StormrageWang

最新推荐文章于 2022-08-17 08:47:44 发布

阅读量1.6k

点赞数

分类专栏： yarn 文章标签： linux hadoop 集群

本文链接：https://blog.csdn.net/StormrageWang/article/details/41148917

版权

yarn 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

上一篇博文 http://blog.csdn.net/stormragewang/article/details/41116213中介绍了虚拟机集群的基本环境的安装，当各个虚拟机设置好共享文件夹，安装好JAVA并且master结点可以通过ssh了解到slave1和slave2结点，这时候我们就可以进行Hadoop的安装工作了。

还是先把集群的规划贴出来

主机名	别名	OS	IP	HDFS角色	Yarn角色
Yarn-Matser	master	CentOS_6.5_i686 minimal desktop	192.168.137.100	namenode,datanode	resourcemanager,nodemanager
Yarn-Slave_1	slave1	CentOS_6.5_i686 minimal	192.168.137.101	datanode	nodemanager
Yarn-Slave_2	slave2	CentOS_6.5_i686 minimal	192.168.137.102	datanode	nodemanager

首先去 http://hadoop.apache.org/releases.html#Download下载一个版本，我用的是2.5.1版本，放到共享文件夹中待用。

对于3个节点都进行以下操作：在Hadoop用户的用户目录下新建了个文件夹命名为Hadoop，并在下面新建3个文件夹：2.5.1、dfs、tmp，并在tmp下新建2个文件夹data、name。同时将下载的压缩文件解析到文件夹2.5.1中

hadoop的配置文件都在etc/hadoop文件下

其中有2个*.bak文件是我自己创建的，对于基本配置有4个文件要注意：core-site.xml、 hdfs-site.xml、 mapred-site.xml、 yarn-site.xml、 slaves 这5个文件，详细的信息可以参考官方文档 http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-common/ClusterSetup.html

先配置master节点，只需要配置core-site.xml、 hdfs-site.xml、 slaves 这3个文件

core-site.xml 文件如下

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
	<property>
		<name>fs.defaultFS</name>
		<value>hdfs://master</value>
	</property>
	<property>
  		<name>hadoop.tmp.dir</name>
  		<value>file:/home/Hadoop/tmp</value>
 		<description>A base for other temporary directories.</description>
 	</property>
</configuration>

hdfs-core.xml文件如下

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
	<property>
		<name>dfs.datanode.data.dir</name>
		<value>file:/home/Hadoop/Hadoop/dfs/data</value>
		<description>Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks.</description>
	</property>
	<property>
		<name>dfs.namenode.name.dir</name>
		<value>file:/home/Hadoop/Hadoop/dfs/name</value>
		<description>Path on the local filesystem where the NameNode stores the namespace and transaction logs persistently.</description>
	</property>
	<property>
		<name>dfs.replication</name>
		<value>2</value>
       </property> 
</configuration>

slaves文件如下：

master
slave1
slave2

然后配置slave1和slave2节点，只需要配置core-site.xml、 hdfs-site.xml、 yarn-site.xml这3个文件

core-site.xml文件如下

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
	<property>
		<name>fs.defaultFS</name>
		<value>hdfs://master</value>
	</property>
	<property>
  		<name>hadoop.tmp.dir</name>
  		<value>file:/home/Hadoop/tmp</value>
 		<description>A base for other temporary directories.</description>
 	</property>
</configuration>

hdfs-site.xml文件如下

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
	<property>
		<name>dfs.datanode.data.dir</name>
		<value>file:/home/Hadoop/Hadoop/dfs/data</value>
		<description>Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks.</description>
	</property>
	<property>
		<name>dfs.replication</name>
		<value>2</value>
       </property> 
</configuration>

yarn-site.xml文件如下

<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>
	<property>
		<name>yarn.nodemanager.aux-services</name>
		<value>mapreduce_shuffle</value>
	</property>
	<property>
		<name>yarn.resourcemanager.hostname</name>
		<value>master</value>
	</property>
	<property>
		<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
		<value>org.apache.hadoop.mapred.ShuffleHandler</value>
	</property>   
</configuration>

然后配置hadoop环境参数，对于3个节点都进行以下操作：

先编辑.bash_profile如下：

# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
	. ~/.bashrc
fi

# User specific environment and startup programs

PATH=$PATH:$HOME/bin

export PATH

# make all configuration files in $HOME/.env_vars works
for i in $HOME/.env_vars/*.sh ; do
	if [ -r "$i" ]; then
		. "$i"
	fi
done

该脚本会自动读取所用.env_vars文件夹下的所有*.sh文件，把环境变量配置到bash中，在用户目录下新建一个.env_vars文件夹，然后在.env_vars文件夹中新建一个hadoop_2.5.1.sh文件，其内容如下：

# setup environment variables of hadoop_2.5.1
#

export HADOOP_PREFIX=$HOME/Hadoop/2.5.1

export HADOOP_HOME=$HADOOP_PREFIX
export HADOOP_COMMON_HOME=$HADOOP_PREFIX
export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
export HADOOP_HDFS_HOME=$HADOOP_PREFIX
export HADOOP_MAPRED_HOME=$HADOOP_PREFIX
export HADOOP_YARN_HOME=$HADOOP_PREFIX
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

启动分布式集群：

首先在master节点关闭防火墙

[Hadoop@Yarn-Master ~]$ su
Password: 
[root@Yarn-Master Hadoop]# service iptables stop
iptables: Setting chains to policy ACCEPT: filter          [  OK  ]
iptables: Flushing firewall rules:                         [  OK  ]
iptables: Unloading modules:                               [  OK  ]
[root@Yarn-Master Hadoop]#

然后在mster结点格式化分布式文件系统：$HADOOP_HOME/bin/hdfs namenode -format

然后在$HADOOP_HOME/sbin目录下依次启动HDFS和YARN，先启动dfs

[Hadoop@Yarn-Master ~]$ cd Hadoop/2.5.1/sbin/
[Hadoop@Yarn-Master sbin]$ ls
distribute-exclude.sh    start-all.cmd        stop-all.sh
hadoop-daemon.sh         start-all.sh         stop-balancer.sh
hadoop-daemons.sh        start-balancer.sh    stop-dfs.cmd
hdfs-config.cmd          start-dfs.cmd        stop-dfs.sh
hdfs-config.sh           start-dfs.sh         stop-secure-dns.sh
httpfs.sh                start-secure-dns.sh  stop-yarn.cmd
mr-jobhistory-daemon.sh  start-yarn.cmd       stop-yarn.sh
refresh-namenodes.sh     start-yarn.sh        yarn-daemon.sh
slaves.sh                stop-all.cmd         yarn-daemons.sh
[Hadoop@Yarn-Master sbin]$ ./start-dfs.sh 
14/11/15 20:41:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [master]
master: starting namenode, logging to /home/Hadoop/Hadoop/2.5.1/logs/hadoop-Hadoop-namenode-Yarn-Master.out
slave2: starting datanode, logging to /home/Hadoop/Hadoop/2.5.1/logs/hadoop-Hadoop-datanode-Yarn-Slave_2.out
slave1: starting datanode, logging to /home/Hadoop/Hadoop/2.5.1/logs/hadoop-Hadoop-datanode-Yarn-Slave_1.out
master: starting datanode, logging to /home/Hadoop/Hadoop/2.5.1/logs/hadoop-Hadoop-datanode-Yarn-Master.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/Hadoop/Hadoop/2.5.1/logs/hadoop-Hadoop-secondarynamenode-Yarn-Master.out
14/11/15 20:42:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[Hadoop@Yarn-Master sbin]$

可以在浏览器中查看结点状态

再启动YARN

[Hadoop@Yarn-Master sbin]$ ./start-yarn.sh 
starting yarn daemons
starting resourcemanager, logging to /home/Hadoop/Hadoop/2.5.1/logs/yarn-Hadoop-resourcemanager-Yarn-Master.out
slave2: starting nodemanager, logging to /home/Hadoop/Hadoop/2.5.1/logs/yarn-Hadoop-nodemanager-Yarn-Slave_2.out
slave1: starting nodemanager, logging to /home/Hadoop/Hadoop/2.5.1/logs/yarn-Hadoop-nodemanager-Yarn-Slave_1.out
master: starting nodemanager, logging to /home/Hadoop/Hadoop/2.5.1/logs/yarn-Hadoop-nodemanager-Yarn-Master.out
[Hadoop@Yarn-Master sbin]$

也可以在浏览器中查看结点状态