全分布式Hadoop安装以及环境配置

最新推荐文章于 2023-02-22 19:48:36 发布

zx8167107

最新推荐文章于 2023-02-22 19:48:36 发布

阅读量340

点赞数

本文链接：https://blog.csdn.net/zx8167107/article/details/78753203

版权

1 前言

该文档用于指导在linux上搭建全分布式Hadoop环境，文档中的软件版本经过我们实际使用过程进行测试，兼容性良好，环境稳定。

2 操作系统

CentOS release6.7 (Final)

Linux node12.6.32-573.el6.x86_64 #1 SMP Thu Jul 23 15:44:03 UTC 2015 x86_64 x86_64 x86_64GNU/Linux

linux虚拟机、实体机均可

集群中总共有三台机器，虚拟机hostname：IP地址

node1 :172.7.56.205

node2 :172.7.56.206

node3 :172.7.56.207

3 所需软件安装包

hadoop-2.5.2、hbase-1.1.4、jdk1.7.0_80、zookeeper-3.4.6等

目前系统中的hadoop所涉及组件均解压或安装在/opt/hadoop目录下（jdk的安装可能存在rpm包与非rpm的区别，rpm包自动将java安装在/usr/bin目录下，这个只需修改下环境变量/etc/profile中的JAVA_HOME的对应值即可）

4 配置步骤

前面所需的软件包均正确安装在/opt/hadoop目录下（jdk安装情况见上文）。

4.1 主机名相关配置

（1）先正确设置各机器的hostname

方法1：

vi /etc/sysconfig/network

将HOSTNAME=后的内容，改成想要的机器名

e.g.

NETWORKING=yes

HOSTNAME=node1

方法2：

[root@localhost~]# hostnamectl --static set-hostname hdp84（主机名）

改完重启reboot –f

（2）修改hosts文件

vi /etc/hosts

e.g.

127.0.0.1 localhost

172.7.56.205node1.dcom node1

172.7.56.206node2.dcom node2

172.7.56.207node3.dcom node3

这样做的目的是使node*和具体IP对应，不用使用IP来表示，配置完成后重启机器使配置生效，具体验证的话可以使用ping node*命令。

4.2 jdk相关配置

jdk的安装具体百度，从官网下载软件包（分rpm、bin以及gz等），安装成功后输入update-alternatives --config java命令查看java的安装情况，若系统环境中安装了多个jdk还可以使用update-alternatives命令来进行选择（具体百度之）

设置java环境变量：

vi /etc/profile，在最后添加：

e.g.

export JAVA_HOME=/opt/hadoop/jdk1.7.0_80

export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

exportPATH=$PATH:$JAVA_HOME/bin

4.3 创建hadoop用户

在3台设备上创建专门用户hadoop

useraddhadoop (创建用户)

passwd 123456 （设置密码，为简单起见，3台机器上的hadoop密码最好设置成一样，

比如123456）

为了方便，建议将hadoop加入root用户组，操作方法：

先以root身份登录，然后输入usermod-g root hadoop ，执行完后hadoop即归属于root组了，可以再输入id hadoop 查看输出验证一下，如果看到类似下面的输出：uid=502(hadoop) gid=0(root) 组=0(root)，就表示OK了。

4.4 配置ssh免密码登陆

hadoop工作时，各节点要相互通讯，正常情况下linux之间通讯要提供用户名、密码（目的是保证通讯安全），如果需要人工干预输入密码，显然不方便，做这一步的目的，是让各节点能自动通过安全认证，不影响正常通讯。

（1）先在master上，生成公钥、私钥对

以hadoop身份登录到系统

cd (进入个人主目录，默认为/home/hadoop)

ssh-keygen -trsa -P '' (注：最后是二个单引号)

即：以rsa算法，生成公钥、私钥对，-P''表示空密码。该命令运行完后，会在个

人主目录下生成.ssh目录，里面会有二个文件id_rsa（私钥） ,id_rsa.pub(公钥)

（2）导入公钥

cat.ssh/id_rsa.pub >> .ssh/authorized_keys

执行完以后，可以在本机上测试下，用ssh连接自己，即：sshlocalhost (或ssh master)，

如果不幸还是提示要输入密码，说明还没起作用，还有一个关键的操作

chmod 600.ssh/authorized_keys (修改文件权限，否则不起作用)

然后再测试下 ssh localhost ，如果不需要输入密码，就连接成功，表示ok，一台机器已经搞定了。

（3）在其它机器上生成公钥、密钥，并将公钥文件复制到master

a) 以hadoop身份登录其它二台机器slave01、slave02，执行 ssh-keygen -t rsa -P '' 生成

公钥、密钥

b) 然后用scp命令，把公钥文件发放给master（即：刚才已经搞定的那台机器）

slave01上：

scp.ssh/id_rsa.pub hadoop@master:/home/hadoop/id_rsa_01.pub

slave02上：

scp.ssh/id_rsa.pub hadoop@master:/home/hadoop/id_rsa_02.pub

这二行执行完后，回到master中，查看下/home/hadoop目录，应该有二个新文件id_rsa_01.pub、id_rsa_02.pub，然后在master上，导入这二个公钥

catid_rsa_01.pub >> .ssh/authorized_keys

catid_rsa_02.pub >> .ssh/authorized_keys

这样，master这台机器上，就有所有3台机器的公钥了。

（4）将master上的“最全”公钥，复制到其它机器

a) 继续保持在master上

scp.ssh/authorized_keys hadoop@slave01:/home/hadoop/.ssh/authorized_keys

scp.ssh/authorized_keys hadoop@slave02:/home/hadoop/.ssh/authorized_keys

b) 修改其它机器上authorized_keys文件的权限

slave01以及slave02机器上，均执行命令

chmod 600.ssh/authorized_keys

（5）验证

在每个虚拟机上，均用 ssh 其它机器的hostname 验证下，如果能正常无密码连接成功，

表示ok。

小结：该步骤非常重要，主要思路是在各节点上生成公钥、私钥，然后将公钥发放其它所有节点。RSA算法是非对称加密算法，仅公布“公钥”，只要私钥不外泄，还是不能解密的，所以安全性依然有保障。

4.5 上传解压安装包

a）在本机上，用scp命令上传hadoop2.5.2到node1

scp hadoop-2.5.2.tar.gz hadoop@node1:/opt/hadoop/

b) 以hadoop身份登录到master，运行以下命令解压

tar-zxvf hadoop-2.5.2.tar.gz

另外hbase-1.1.4和zookeeper-3.4.6的安装和解压与hadoop-2.5.2一致

4.6 hadoop配置修改

Hadoop相关的一共有7个文件要修改：

/opt/hadoop/hadoop-2.5.2/etc/hadoop/hadoop-env.sh

/opt/hadoop/hadoop-2.5.2/etc/hadoop/yarn-env.sh

/opt/hadoop/hadoop-2.5.2/etc/hadoop/core-site.xml

/opt/hadoop/hadoop-2.5.2/etc/hadoop/hdfs-site.xml

/opt/hadoop/hadoop-2.5.2/etc/hadoop/mapred-site.xml

/opt/hadoop/hadoop-2.5.2/etc/hadoop/yarn-site.xml

/opt/hadoop/hadoop-2.5.2/etc/hadoop/slaves

a) hadoop-env.sh、yarn-env.sh

这二个文件主要是修改JAVA_HOME后的目录，改成实际本机jdk所在目录位置

vietc/hadoop/hadoop-env.sh （及 vi etc/hadoop/yarn-env.sh）

找到下面这行的位置，改成（jdk目录位置，大家根据实际情况修改）

export JAVA_HOME=/opt/hadoop/jdk1.7.0_80

另外 hadoop-env.sh中 , 建议加上这句:

export HADOOP_PREFIX=/opt/hadoop/hadoop-2.5.2

b) core-site.xml参考下面的内容

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License. See accompanying LICENSE file.

-->

<name>fs.defaultFS</name>

</property>

<name>hadoop.tmp.dir</name>

<value>/opt/hadoop/tempfiles/hadoop_tmp_dir</value>

</property>

</configuration>

注：/opt/hadoop/tempfiles/hadoop_tmp_dir目录请手动创建

c) hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License. See accompanying LICENSE file.

-->

<name>dfs.datanode.ipc.address</name>

</property>

<name>dfs.datanode.http.address</name>

</property>

<value>/opt/hadoop/tempfiles/dfs_name_dir</value>

</property>

<value>/opt/hadoop/tempfiles/dfs_data_dir</value>

</property>

<name>dfs.replication</name>

</property>

</configuration>

注:dfs.replication表示数据副本数，一般不大于datanode的节点数

d) mapred-site.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License. See accompanying LICENSE file.

-->

<name>mapreduce.framework.name</name>

</property>

</configuration>

e) yarn-site.xml

<?xml version="1.0"?>

<!--

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License. See accompanying LICENSE file.

-->

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

<name>yarn.resourcemanager.hostname</name>

</property>

</configuration>

最后一个文件slaves暂时不管（可以先用mv slaves slaves.bak 将它改名），上述配置弄好后，就可以在master上启用 NameNode测试了，方法：

/opt/hadoop/hadoop-2.5.2/bin/hdfs namenode –format 先格式化

15/02/12 21:29:53 INFOnamenode.FSImage: Allocated new BlockPoolId: BP-85825581-192.168.187.102-1423747793784

15/02/12 21:29:53 INFO common.Storage: Storage directory/home/hadoop/tmp/dfs/name has been successfullyformatted.

等看到这个时，表示格式化ok

/opt/hadoop/hadoop-2.5.2/sbin/start-dfs.sh

启动完成后，输入jps查看进程，如果看到以下二个进程：

5161 SecondaryNameNode

4989 NameNode

表示master节点基本ok了

再输入/opt/hadoop/hadoop-2.5.2/sbin/start-yarn.sh ，完成后，再输入jps查看进程

5161 SecondaryNameNode

5320 ResourceManager

4989 NameNode

如果看到这3个进程，表示yarn也ok了

f) 修改slaves

如果刚才用mv slaves slaves.bak对该文件重命名过，先运行 mv slaves.bak slaves 把名字改回来，再

vi slaves编辑该文件，输入

node2

node3

保存退出，最后运行

/opt/hadoop/hadoop-2.5.2/sbin/stop-dfs.sh

/opt/hadoop/hadoop-2.5.2/sbin/stop-yarn.sh

停掉刚才启动的服务

4.7 将master上的hadoop目录复制到node2,node3

仍然保持在master机器上

cd 先进入主目录

scp -r hadoop-2.6.0 hadoop@slave01:/opt/hadoop/

scp -r hadoop-2.6.0 hadoop@slave02:/opt/hadoop/

注：slave01、slave02上的hadoop临时目录(tmp)及数据目录(data)，仍然要先

手动创建。

4.8 验证

node1节点上，重新启动

/opt/hadoop/hadoop-2.5.2/sbin/start-dfs.sh

/opt/hadoop/hadoop-2.5.2/sbin/start-yarn.sh

顺利的话，node1节点上有几下3个进程：

7482 ResourceManager

7335 SecondaryNameNode

7159 NameNode

node2、node3上有几下2个进程：

2296 DataNode

2398 NodeManager

同时可浏览：

http://node1:50070/

http://node1:8088/

查看状态，另外也可以通过bin/hdfsdfsadmin -report查看hdfs的状态报告

其它注意事项：

a) node1(即：namenode节点)若要重新格式化，请先清空各datanode上的data

目录（最好连tmp目录也一起清空），否则格式化完成后，启动dfs时，datanode会启动失败

b) 如果觉得node1机器上只运行namenode比较浪费,想把master也当成一个

datanode,直接在slaves文件里,添加一行master即可

c) 为了方便操作，可修改/etc/profile，把hadoop所需的lib目录，先加到

CLASSPATH环境变量中，同时把hadoop/bin,hadoop/sbin目录也加入到PATH变量中。

4.9 其他配置（hbase、zookeeper等）

Hbase配置：

进入/opt/hadoop/hbase-1.1.4/conf目录，主要修改的是hbase-env.sh和hbase-site.xml这两个文件，

hbase-site.xml文件内容：

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--

/**

* Licensed to the Apache Software Foundation (ASF) under one

* or more contributor license agreements. See the NOTICE file

* distributed with this work for additional information

* regarding copyright ownership. The ASF licenses this file

* to you under the Apache License, Version 2.0 (the

* "License"); you may not use this file except in compliance

* with the License. You may obtain a copy of the License at

* http://www.apache.org/licenses/LICENSE-2.0

* Unless required by applicable law or agreed to in writing, software

* distributed under the License is distributed on an "AS IS" BASIS,

* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

* See the License for the specific language governing permissions and

* limitations under the License.

-->

<name>hbase.cluster.distributed</name>

</property>

<name>hbase.rootdir</name>

<value>hdfs://node1:9000/hbase</value>

</property>

<name>hbase.zookeeper.quorum</name>

</property>

<name>hbase.zookeeper.property.dataDir</name>

<value>/opt/hadoop/tempfiles/hbase_zookeeper_property_dataDir</value>

</property>

<name>hbase.table.sanity.checks</name>

<value>false</value>

</property>

<name>hbase.master.maxclockskew</name>

</property>

</configuration>

Hbase-env.sh的文件内容的话主要是配置jdk，找到这一行，在JAVA_HOME=后面写上对应的内容即可

# The javaimplementation to use. Java 1.7+required.

export JAVA_HOME=/opt/hadoop/jdk1.7.0_80

zookeeper配置：

进入/opt/hadoop/zookeeper-3.4.6/conf目录，修改zoo.cfg文件，具体内容可参考：

# The number of milliseconds of each tick

tickTime=2000

# The number of ticks that the initial

# synchronization phase can take

initLimit=5

# The number of ticks that can pass between

# sending a request and getting an acknowledgement

syncLimit=2

# the directory where the snapshot is stored.

# do not use /tmp for storage, /tmp here is just

# example sakes.

dataDir=/opt/hadoop/zookeeper-3.4.6/data

# the port at which the clients will connect

clientPort=2181

# the maximum number of client connections.

# increase this if you need to handle more clients

#maxClientCnxns=60

# Be sure to read the maintenance section of the

# administrator guide before turning on autopurge.

# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance

# The number of snapshots to retain in dataDir

#autopurge.snapRetainCount=3

# Purge task interval in hours

# Set to "0" to disable auto purge feature

#autopurge.purgeInterval=1

server.1=node1:2888:3888

server.2=node2:2888:3888

server.3=node3:2888:3888

进入/opt/hadoop/zookeeper-3.4.6/data目录，修改myid文件，改成对应的编号，如1、2、3等。

配置完成后可将上述文件所在的整个目录至其他机器上，也可分别拷贝文件，有些具体配置需要手动分别修改，例如myid等。

4.10 启动服务

hadoop、hbase和zookeeper安装配置完毕后，启动各项服务

（1）先在集群中各个机器上启动zookeeper服务

zookeeper-3.4.6/bin/zkServer.sh start

启动之后，各个节点上使用jps命令查看的话会有QuorumPeerMain服务在运行，

[hadoop@node1hadoop]$ jps

11563QuorumPeerMain

另外可通过zookeeper-3.4.6/bin/zkServer.sh status查看当前节点zookeeper的运行状态：

[hadoop@node1hadoop]$ zookeeper-3.4.6/bin/zkServer.sh status

JMX enabled bydefault

Using config:/opt/hadoop/zookeeper-3.4.6/bin/../conf/zoo.cfg

Mode: follower

（2）进入主节点上的hadoop-2.5.2目录

通过hadoop-2.5.2/sbin/start-dfs.shstart命令启动 hdfs 服务

Jps查看：

主节点：

11882SecondaryNameNode

11697 NameNode

从节点：

6263 aNode

（3）在hadoop-2.5.2目录下启动yarn服务

使用hadoop-2.5.2/sbin/start-yarn.shstart命令启动yarn服务

Jps查看：

主节点：

12033ResourceManager

（4）在hbase目录下启用hbase服务

使用hbase-1.1.4/bin/start-hbase.sh脚本

Jps查看：

主节点：

12921HMaster

从节点：

6738HRegionServer

至此，hadoop全分布式的主要服务已经配置搭建完毕。

5 修订说明

日期	版本	修订	审批	修订说明
2016.9.29	1.0	章鑫8

zx8167107

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫