目录
文章目录
一.准备工作
安装三个CentOS7的虚拟机
在虚拟机中安装三个centOS7系统,此部分省略
三个虚拟机之间实现通信
效果:
三个虚拟机之间实现ssh无密登录
输入命令
ssh-keygen -t dsa
另外两个子虚拟机也执行ssh-keygen -t dsa操作,并分别将id_dsa.pub内容拷贝到第一台虚拟机的authorized_keys文件中。将第一台的authorized_keys文件拷贝到另外两台虚拟机的/root/.ssh/ 下面。
效果:
修改三台虚拟机主机名
执行
vi /etc/hostname
以主机(namenode为例,将其修改为leader)
在三台虚拟机里面更改,将其ip与主机名匹配起来
vi /etc/hosts
(PS:查询ip的方法:
输入
ifconfig
)
通过共享文件夹实现与物理机的文件传输
因为安装了centOS后一般都是没有图形化界面的,所以我们通过vmtools无法很好的实现虚拟机和物理机之间的文件传输,这里一个比较好的方法是通过vmware的共享文件夹实现,操作如下:
右键点击对应虚拟机的选项卡,点击设置。
然后在弹出的窗口中依次选择:选项——>共享文件夹,将其右边的已禁用改为总是启用,然后点击下面的添加按钮,点击浏览,选择物理机上的一个文件夹,作为共享文件夹
然后在centOS中输入:
vmware-hgfsclient
就可以看到共享的文件夹:
然后在虚拟机上新建一个文件将其挂载在上面:
mkdir myshare
vmhgfs-fuse .host:/ myshare -o nonempty //后面的-o nonempty不加可能会出错
然后进入你创建的那个文件夹里面,发现共享文件夹就在里面,进去可以看到共享内容
成功实现共享
安装hadoop集群
安装JAVA环境
首先下载一个版本的jdk,然后放在共享文件夹下面,然后将其解压到你希望安装的地方,我的是/usr/local/hadoop/java下,首先进入共享文件夹下面执行命令:
mkdir /usr/local/hadoop/java
tar -zxf jdk1.8.0_161.tar.gz -C /usr/local/hadoop/java
然后修改环境变量:
sudo vi /etc/profile
将下面几行内容写在文件末尾:
export JAVA_HOME=/usr/local/hadoop/java/jdk1.8.0_161
export JRE_HOME=/usr/local/hadoop/java/jdk1.8.0_161/jre
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATHexport PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$JAVA_HOME:$PATH
然后更新一下:
source /etc/profile
可以通过查看版本来检查是否安装成功
java -version
出现以上界面说明没有问题了。
在三台虚拟机上安装hadoop3.3.0
(1)解压文件及新建目录(三台虚拟机同样的操作)
首先进入之前创建的共享文件夹,然后将hadoop解压到指定的文件夹,我的安装位置时/usr/local/hadoop
mkdir /usr/local/hadoop
tar -zxf hadoop-3.3.0.tar.gz -C /usr/local/hadoop
新建几个目录
cd /usr/local/hadoop
mkdir tmp
mkdir var
mkdir dfs
mkdir dfs/name
mkdir dfs/data
(2)修改hadoop配置文件
进入/usr/local/hadoop/hadoop-3.3.0/etc/hadoop/下
cd /usr/local/hadoop/hadoop-3.3.0/etc/hadoop/
修改core-site.xml(l(中的地址为你的地址(完全跟着本篇教程来课无视))):
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://leader:9000</value>
</property>
</configuration>
修改hadoop-env.sh:
添加一行(注意路径为你的java路径):
export JAVA_HOME=/usr/local/hadoop/java/jdk1.8.0_161
修改hdfs-site.xml(中的地址为你的地址(完全跟着本篇教程来课无视)):
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/usr/local/hadoop/dfs/name</value>
<description>Path on the local filesystem where theNameNode stores the namespace and transactions logs persistently.</description>
</property>
<property>
<name>dfs.data.dir</name>
<value>/usr/local/hadoop/dfs/data</value>
<description>Comma separated list of paths on the localfilesystem of a DataNode where it should store its blocks.</description>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.permissions</name>
<value>true</value>
<description>need not permissions</description>
</property>
</configuration>
修改mapred-site.xm(中的地址为你的地址(完全跟着本篇教程来课无视)):
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>leader:49001</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/usr/local/hadoop/var</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
修改worker文件:
第一台虚拟机(leader)
member1
member2
第二虚拟机(member1)
leader
member2
第三虚拟机(member2)
leader
member1
博客地址:https://blog.csdn.net/qq_44846166/article/details/111169529
修改yarn-site.xml文件(中的地址为你的地址(完全跟着本篇教程来课无视)):
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>leader</value>
</property>
<property>
<description>The address of the applications manager interface in the RM.</description>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8032</value>
</property>
<property>
<description>The address of the scheduler interface.</description>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
</property>
<property>
<description>The http address of the RM web application.</description>
<name>yarn.resourcemanager.webapp.address</name>
<value>${yarn.resourcemanager.hostname}:8088</value>
</property>
<property>
<description>The https adddress of the RM web application.</description>
<name>yarn.resourcemanager.webapp.https.address</name>
<value>${yarn.resourcemanager.hostname}:8090</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>${yarn.resourcemanager.hostname}:8031</value>
</property>
<property>
<description>The address of the RM admin interface.</description>
<name>yarn.resourcemanager.admin.address</name>
<value>${yarn.resourcemanager.hostname}:8033</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>1024</value>
<discription>每个节点可用内存,单位MB,默认8182MB</discription>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>1024</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
</configuration>
(3)启动hadoop:
格式化脚本
cd /usr/local/hadoop/hadoop-3.3.0/bin
./hadoop namenode -format
启动
cd /usr/local/hadoop/hadoop-3.3.0/sbin
./start-all.sh
如果出错:
在start_dfs.sh 、stop-dfs.sh两个文件开头位置添加如下配置:
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=root
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root注意:一定要是开头位置。在start_yarn.sh 、stop-yarn.sh两个文件开头位置添加如下配置:
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=root
YARN_NODEMANAGER_USER=root
测试是否安装成功
在浏览器上输入主机(leader)的ip地址+:8088,会弹出如下画面:
博客地址:https://blog.csdn.net/qq_44846166/article/details/111169529
遇到的问题
(1)无法上网
解决参考:CentOS-7,网络ping不通详解_pocher的博客-CSDN博客_centos7 ping不通
(2)hadoop运行成功,可以访问8088端口但无法访问9870端口
解决参考: 启动Hadoop start-dfs.sh Permission denied
stackoverflow上的回答
首先需要检查防火墙是否关闭,输入以下命令关闭防火墙:
systemctl stop firewalld.service
systemctl disable firewalld.service
然后再运行一下./start-dfs.sh
cd /usr/local/hadoop/hadoop-3.3.0/sbin
./start-dfs.sh
本人运行的时候出现了这样的错误:
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Zk2eZLKN-1607927360837)(en-resource://database/6913:1)]
原因是未生成公私钥对,设置免密登录
输入
ssh-keygen -t rsa
cd /root/.ssh
cp id_rsa.pub authorized_keys
然后运行./start-dfs.sh就没有那个错误了
此时我们可以查看端口情况:
输入:
netstat -tlunp
运行成功会出现以上三个接口
(安装netstat命令:
yum install net-tools
)
然后浏览器访问你的主机(leader)ip + 端口号9870就可以看到:
文章参考:
Hadoop-3.3.0 安装_zhanghaoninhao的博客-CSDN博客
centos7虚拟机与主机共享文件夹 - christine-ting - 博客园 (cnblogs.com)
Hadoop集群搭建教程(详细)_fanxin_i的博客-CSDN博客_hadoop集群搭建完整教程
用三台虚拟机搭建Hadoop全分布集群 - 奇域巫师 - 博客园 (cnblogs.com)
centos7安装配置Hadoop集群_一只修炼成精的猴子的博客-CSDN博客_centos7安装hadoop