Centos 7.5 部署hadoop 2.7文档
准备工作
关闭防火墙
面密码登陆
配置jdk环境
关闭防火墙
systemctl stop firewalld
关闭selinux
setenfore 0
2.0: 本次实验采用两台虚拟机进行安装
主机名为 master salve
在master写入主机名
#shell-master: cat /etc/hosts
192.168.200.10 master
192.168.200.20 salve
#shell-master: scp /etc/hosts salve:/etc/
#shell-master: ssh-keygen (出现交互页面一直按回车)
**salve操作**
#shell-slave: ssh-keygen (出现交互页面一直按回车)
#shell-slave: cd /root/.ssh/
#shell-slave: ls
id_rsa id_rsa.pub
#shell-slave: cat id_rsa.pub >> authorized_keys
#shell-slave: scp /root/.ssh/authorized_keys master:/root/.ssh
*** master 操作***
#shell-master: cd /root/.ssh
#shell-master: cat id_rsa.pub >> authorized_keys
#shell-master: scp authorized_keys slave:/root/.ssh
#shell-master: ssh slave
***slave操作***
#shell-slave: ssh master
安装JDK环境 master安装
jdk下载地址百度自己自己找去
#shell-master: tar xf jdk1.8.0_211.tar.gz -C /usr/local
#shell-master: cd /usr/local/ && mv jdk1.8.0_211 jdk
#shell-master: tail -2 /etc/profile
export jdk=/usr/local/jdk
export PATH=$PATH:$jdk/bin
#shell-master: scp /usr/loca/jdk slave:/usr/local/
#shell-master: scp /etc/profile.d slave:/etc/profile.d
#shell-master: source /etc/profile.d
#shell-master: java -version
java version "1.8.0_211"
Java(TM) SE Runtime Environment (build 1.8.0_211-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.211-b12, mixed mode)
slave操作
#shell-slave: source /etc/profile.d
#shell-slave: java -version
java version "1.8.0_211"
Java(TM) SE Runtime Environment (build 1.8.0_211-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.211-b12, mixed mode)
部署hadoop 2.7
hadoop下载地址
https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gz
#shell-master: tar xf hadoop-2.7.7.tar.gz -C /usr/local/
#shell-master: tail -n 4 /etc/profile.d
export HADOOP_HOME=/usr/local/hadoop-2.7.7 # 该目录为解压安装目录
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
#保存后,使profile生效
#shell-master: source /etc/profile
- 配置Hadoop环境脚本文件中的JAVA_HOME参数
#进入Had安装目录下的etc/hadoop目录
#shell-master: cd /usr/local/hadoop-2.7.7/etc/hadoop
#分别在hadoop-env.sh、mapred-env.sh、yarn-env.sh文件中添加或修改如下参数:
#shell-master: vim hadoop-env.sh
#shell-master: vim mapred-env.sh
#shell-master: vim yarn-env.sh
export JAVA_HOME="/usr/local/jdk" # 路径为jdk安装路径
- 修改Hadoop配置文件
Hadoop安装目录下的etc/hadoop目录中,需修改core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml、slaves(3.0之后为workers)文件,根据实际情况修改配置信息。
(1)core-site.xml
<configuration>
<property>
<!-- 配置hdfs地址 -->
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<!-- 保存临时文件目录,需先在/usr/local/hadoop-2.7.7下创建tmp目录 -->
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop-2.7.7/tmp</value>
</property>
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
</configuration>
(2)hdfs-site.xml
<configuration>
<property>
<!-- 主节点地址 -->
<name>dfs.namenode.http-address</name>
<value>master:50070</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/opt/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/opt/hadoop/dfs/data</value>
</property>
<property>
<!-- 备份份数 -->
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<!-- 第二节点地址 -->
<name>dfs.namenode.secondary.http-address</name>
<value>slave:9001</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
<description>配置为false后,可以允许不要检查权限就生成dfs上的文件,需防止误删操作</description>
</property>
</configuration>
(3)mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop01:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
</configuration>
(4)yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<!-- NodeManager中的配置,这里配置过小可能导致nodemanager启动不起来
大小应该大于 spark中 executor-memory + driver的内存 -->
<value>6144</value>
</property>
<property>
<!-- RsourceManager中配置
大小应该大于 spark中 executor-memory + driver的内存 -->
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>61440</value>
</property>
<property>
<!-- 使用核数 -->
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>2</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
<description>忽略虚拟内存的检查,如果你是安装在虚拟机上,这个配置很有用,配上去之后后续操作不易出问题。</description>
</property>
<property>
<!-- 调度策略,设置为公平调度器 -->
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
</configuration>
(5)slaves文件
#增加从节点地址(若配置了hosts,可直接使用主机名,亦可用IP地址)
#shell-master: vim slaves
slave
- 将配置好的文件夹拷贝到其他从节点
#shell-master: scp -r /usr/local/hadoop-2.7.7 root@slave:/usr/local
- 初始化 & 启动
#格式化
#shell-master: bin/hdfs namenode -format
#启动
#shell-master: sbin/start-dfs.sh
#shell-master: sbin/start-yarn.sh
- 验证Hadoop启动成功
#主节点
#shell-master: jps
5895 Jps
5624 ResourceManager
5356 NameNode
#从节点
#shell-master: jps
5152 SecondaryNameNode
5085 DataNode
5245 NodeManager
5357 Jps
- Web端口访问
注:先开放端口或直接关闭防火墙
查看防火墙状态
firewall-cmd --state
临时关闭
systemctl stop firewalld
禁止开机启动
systemctl disable firewalld
在浏览器输入:http://master:8088打开Hadoop Web页面。
在浏览器输入:http://master:50070打开Hadoop Web页面。
如有不懂、侵犯等问题随时留言或者联系QQ:2695683956