集群规划
集群环境,四台centos7的物理机,角色分配如下:
host | NameNode | DataNode | ResourceManager | NodeManager |
192.169.0.101 | 有 | 无 | 有 | 无 |
192.169.0.102 | 无 | 有 | 无 | 有 |
192.169.0.100 | 无 | 有 | 无 | 有 |
192.169.0.99 | 无 | 有 | 无 | 有 |
创建hadoop账户
创建账户
useradd -m hadoop # 添加用户
passwd hadoop # 设置密码
添加sudo权限,修改/etc/suoders权限
sudo chmod u+w /etc/sudoers
配置hadoop的sudo权限,在 root ALL=(ALL) ALL的下一行添加
hadoop ALL=(ALL) ALL
改回/etc/suoders权限
sudo chmod u-w /etc/sudoers
登录hadoop账户
su hadoop
修改hostname,以192.168.0.101机器为例
sudo hostnamectl set-hostname hadoop101
配置hosts
192.169.0.99 hadoop99
192.168.0.100 hadoop100
192.168.0.101 hadoop101
192.168.0.102 hadoop102
注意不要将hostname配置到127.0.0.1后边,否则会出现无法发现DataNode的问题
配置ssh免密码登录
ssh-keygen -t rsa
分发公钥,以hadoop101为例
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop99
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop100
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop102
验证免密登录
[hadoop@hadoop101 ~]$ ssh hadoop100
Last login: Fri Sep 27 13:40:59 2019 from 192.168.0.94
[hadoop@hadoop100 ~]$
配置Java环境
卸载openjdk
[hadoop@hadoop100 ~]$ rpm -qa|grep java
java-1.8.0-openjdk-headless-1.8.0.222.b10-1.el7_7.x86_64
java-1.8.0-openjdk-1.8.0.222.b10-1.el7_7.x86_64
tzdata-java-2019b-1.el7.noarch
python-javapackages-3.4.1-11.el7.noarch
javapackages-tools-3.4.1-11.el7.noarch
[hadoop@hadoop100 ~]$ sudo rpm -e --nodeps java-1.8.0-openjdk-1.8.0.222.b10-1.el7_7.x86_64
[hadoop@hadoop100 ~]$ sudo rpm -e --nodeps java-1.8.0-openjdk-headless-1.8.0.222.b10-1.el7_7.x86_64
[hadoop@hadoop100 ~]$ sudo rpm -e --nodeps tzdata-java-2019b-1.el7.noarch
[hadoop@hadoop100 ~]$ sudo rpm -e --nodeps javapackages-tools-3.4.1-11.el7.noarch
下载jdk地址:https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
解压
tar -zxvf jdk-8u211-linux-x64.tar.gz
sudo mv jdk1.8.0_211/ /usr/local/jdk1.8
配置环境变量,
编辑/etc/profile
sudo vim /etc/profile
末尾添加如下内容
export JAVA_HOME=/usr/local/jdk1.8/
export JAVA_BIN=/usr/local/jdk1.8/bin
export JRE_HOME=/usr/local/jdk1.8/jre
export PATH=$PATH:/usr/local/jdk1.8/bin:/usr/local/jdk1.8/jre/bin
export CLASSPATH=/usr/local/jdk1.8/jre/lib:/usr/local/jdk1.8/lib:/usr/local/jdk1.8/jre/lib/charsets.jar
加载环境变量
source /etc/profile
验证java是否安装成功
[hadoop@hadoop101 rpmpackage]$ java -version
java version "1.8.0_211"
Java(TM) SE Runtime Environment (build 1.8.0_211-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.211-b12, mixed mode)
以上过程每台机器都做。
安装hadoop
下载安装包
wget http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.7.0.tar.gz
解压到/usr/local/src/
tar -zxvf hadoop-2.6.0-cdh5.7.0.tar.gz -C /usr/local/src
进入配置目录:
[hadoop@hadoop101 hadoop]$ cd /usr/local/src/hadoop-2.6.0-cdh5.7.0/etc/hadoop
[hadoop@hadoop101 hadoop]$ ls
capacity-scheduler.xml hadoop-env.cmd hadoop-policy.xml httpfs-signature.secret kms-log4j.properties mapred-env.sh slaves yarn-env.sh
configuration.xsl hadoop-env.sh hdfs-site.xml httpfs-site.xml kms-site.xml mapred-queues.xml.template ssl-client.xml.example yarn-site.xml
container-executor.cfg hadoop-metrics2.properties httpfs-env.sh kms-acls.xml log4j.properties mapred-site.xml ssl-server.xml.example
core-site.xml hadoop-metrics.properties httpfs-log4j.properties kms-env.sh mapred-env.cmd mapred-site.xml.template yarn-env.cmd
配置 hadoop-env.sh ,修改JAVA_HOME
export JAVA_HOME=/usr/local/jdk1.8/
配置core-site.xml,添加如下内容
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://hadoop:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/data3/hadoop/app/tmp</value>
</property>
</configuration>
配置 hdfs-site.xml ,添加如下内容
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/data3/hadoop/app/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/data3/hadoop/app/tmp/dfs/data</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop:9001</value>
</property>
</configuration>
配置 yarn-site.xml,添加如下内容
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop101</value>
</property>
</configuration>
复制 mapred-site.xml.template
cp mapred-site.xml.template mapred-site.xml
配置MapReduce的配置文件 mapred-site.xml,添加如下内容
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
配置从节点的主机名,如果没有配置主机名的情况下就使用IP,配置slaves
hadoop102
hadoop100
hadoop99
创建所需要的文件
sudo mkdir /data3/hadoop/app/tmp/dfs/name -p
sudo mkdir /data3/hadoop/app/tmp/dfs/data -p
给目录赋与操作权限
sudo chown -R hadoop:hadoop /data3/hadoop
配置环境变量,编辑/etc/profile,添加如下内容
export HADOOP_HOME=/usr/local/src/hadoop-2.6.0-cdh5.7.0/
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
加载环境变量
source /etc/profile
到这里我们已经配置好了master节点集群配置,但是还有其他两台作为从节点(slave)的机器没配置Hadoop环境,所以接下来需要把hadoop101上的Hadoop安装目录以及环境变量配置文件分发到其他机器上,分别执行如下命令:
rsync -av /usr/local/src/hadoop-2.6.0-cdh5.7.0/ hadoop100:/usr/local/src/hadoop-2.6.0-cdh5.7.0/
rsync -av /usr/local/src/hadoop-2.6.0-cdh5.7.0/ hadoop102:/usr/local/src/hadoop-2.6.0-cdh5.7.0/
rsync -av /usr/local/src/hadoop-2.6.0-cdh5.7.0/ hadoop99:/usr/local/src/hadoop-2.6.0-cdh5.7.0/
将其他几个节点上的环境变量设置好,编辑/etc/profile
export HADOOP_HOME=/usr/local/src/hadoop-2.6.0-cdh5.7.0/
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
加载环境变量
source /etc/profile
创建所需要的文件并赋与权限
sudo mkdir /data3/hadoop/app/tmp/dfs/name -p
sudo mkdir /data3/hadoop/app/tmp/dfs/data -p
sudo chown -R hadoop:hadoop /data3/hadoop # 赋权
启动hadoop
对NameNode做格式化,只需要在hadoop101上执行即可
hdfs namenode -format
格式化完成之后,就可以启动Hadoop集群了
start-all.sh
启动完成后到其他几台机器上检查DataNode和NodeManger
hadoop102
[hadoop@hadoop102 logs]$ jps
23584 NodeManager
24160 Jps
23381 DataNode
hadoop100
[hadoop@hadoop100 hadoop]$ jps
29809 Jps
15897 NodeManager
15695 DataNode
hadoop99
[hadoop@hadoop99 hadoop]$ jps
58770 Jps
49669 NodeManager
49483 DataNode
看到所有节点已经正常启动了,我们接下来访问一下它的界面http://hadoop101:50070
可以看到三个节点已经存活。搭建完成
如有问题请加技术交流群:526855734