一、环境准备
1、关闭防火墙
systemctl stop firewalld
systemctl disable firewalld.service
2、修改主机名和hosts文件
hostnamectl set-hostname master
vim /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.159.132 master
192.168.159.133 slave1
192.168.159.134 slave2
3、更改selinux 为disabled
vi /etc/selinux/config
SELINUX=disabled
重启生效。(临时生效:setenforce 0)
4、安装并开启服务器http服务
yum install -y httpd
systemctl restart httpd.service
systemctl enable httpd.service
5、安装epel-release
yum install -y epel-release
6、创建需要的目录上传需要的包
mkdir /opt/module mkdir /opt/software
chown hadoop:hadooop -R /opt/*
7、卸载自带的openJDK并安装oracle-j2sdk1.8-1.8.0
rpm -qa | grep -i java | xargs -n1 rpm -e –nodeps
tar -zxvf jdk-8u180-linux-x64.tar.gz -C /user/
二、安装hadoop
1、解压
tar -zxvf hadoop hadoop-3.2.2.tar.gz -C /opt/module/
2、环境变量配置
vim /etc/profile
#JAVA HOME
export JAVA_HOME=/usr/java/jdk1.8.0_181
export PATH=$PATH:$JAVA_HOME/bin
#Hadoop
export HADOOP_HOME=/opt/module/hadoop-3.2.2
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
source /etc/profile
3、测试hadoop、java安装是否成功
4、克隆服务器
5、配置克隆虚拟机的hosts文件和主机名
6、配置主机间hadoop用户ssh免密
[hadoop@master ~]$ ssh-keygen
[hadoop@master ~]$ ssh-copy-id -i .ssh/id_rsa.pub hadoop@slave1
[hadoop@master ~]$ ssh-copy-id -i .ssh/id_rsa.pub hadoop@slave2
[hadoop@master ~]$ ssh-copy-id -i .ssh/id_rsa.pub hadoop@master
7、集群规划
| master | slave1 | slave2 |
HDFS | NameNode | SecondaryNameNode |
|
| DataNode | DataNode | DataNode |
YARM |
| JobHistory Server | Resourcemanager |
| NodeManager | NodeManager | NodeManager |
8、配置文件修改(只在一台服务器上修改然后下发)
目录:/opt/module/hadoop-3.2.2/etc/Hadoop
vim hadoop-env.sh
添加:export JAVA_HOME=/usr/java/jdk1.8.0_181
vim core-site.xml
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!--Configurations for NameNode(SecondaryNameNode)、DataNode、NodeManager:-->
<!-- 指定 NameNode 的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
<description>NameNode URI</description>
</property>
<!-- 指定 hadoop 数据的存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/hadoop-3.2.2/data</value>
</property>
<!-- 配置 HDFS 网页登录使用的静态用户为 hadoop -->
<property>
<name>hadoop.http.staticuser.user</name>
<value>hadoop</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
<description>Size of read/write buffer used in SequenceFiles,The default value is 131072</description>
</property>
</configuration>
vim hdfs-site.xml
<configuration>
<!--Configurations for NameNode:-->
<!--nn 目录 -->
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///opt/module/hadoop-3.2.2/namenode</value>
<!-- nn web 端访问地址-->
<property>
<name>dfs.namenode.http-address</name>
<value>master:9870</value>
</property>
</property>
<!-- 2nn web 端访问地址-->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>slave1:50090</value>
</property>
<!-- 副本个数-->
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!-- 块大小-->
<property>
<name>dfs.blocksize</name>
<value>268435456</value>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>100</value>
</property>
</configuration>
vim yarn-site.xml
<configuration>
<!--Configurations for ResourceManager and NodeManager:-->
<property>
<name>yarn.acl.enable</name>
<value>false</value>
<description>Enable ACLs? Defaults to false. The value of the optional is "true" or "false"</description>
</property>
<property>
<name>yarn.admin.acl</name>
<value>*</value>
<description>ACL to set admins on the cluster. ACLs are of for comma-separated-usersspacecomma-separated-groups. Defaults to special value of * which means anyone. Special value of jus
t space means no one has access</description>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
<description>Configuration to enable or disable log aggregation</description>
</property>
<!--Congrations for ResourceManager:-->
<property>
<name>yarn.resourcemanager.address</name>
<value>slave2:8032</value>
<description>ResourceManager host:port for clients to submit jobs.NOTES:host:port If set, overrides the hostname set in yarn.resourcemanager.hostname.</description>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>slave2:8030</value>
<description>ResourceManager host:port for ApplicationMasters to talk to Scheduler to obtain resources.NOTES:host:port If set, overrides the hostname set in yarn.resourcemanager.hostna
me</description>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>slave2:8031</value>
<description>ResourceManager host:port for NodeManagers.NOTES:host:port If set, overrides the hostname set in yarn.resourcemanager.hostname</description>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>slave2:8033</value>
<description>ResourceManager host:port for administrative commands.NOTES:host:port If set, overrides the hostname set in yarn.resourcemanager.hostname.</description>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>slave2:8088</value>
<description>ResourceManager web-ui host:port. NOTES:host:port If set, overrides the hostname set in yarn.resourcemanager.hostname</description>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>slave2</value>
<description>ResourceManager host</description>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
<description>ResourceManager Scheduler class CapacityScheduler (recommended), FairScheduler (also recommended), or FifoScheduler.The default value is "org.apache.hadoop.yarn.server.res
ourcemanager.scheduler.capacity.CapacityScheduler".
</description>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>512</value>
<description>Minimum limit of memory to allocate to each container request at the Resource Manager.NOTES:In MBs</description>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>1024</value>
<description>Maximum limit of memory to allocate to each container request at the Resource Manager.NOTES:In MBs</description>
</property>
<!--Congrations for History Server:-->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>-1</value>
<description>How long to keep aggregation logs before deleting them. -1 disables. Be careful, set this too small and you will spam the name node.</description>
</property>
<property>
<name>yarn.log-aggregation.retain-check-interval-seconds</name>
<value>-1</value>
<description>Time between checks for aggregated log retention. If set to 0 or a negative value then the value is computed as one-tenth of the aggregated log retention time. Be careful,
set this too small and you will spam the name node.</description>
</property>
<!--Configurations for Configurations for NodeManager:-->
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>1024</value>
<description>Resource i.e. available physical memory, in MB, for given NodeManager.
The default value is 8192.
NOTES:Defines total available resources on the NodeManager to be made available to running containers
</description>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
<description>Maximum ratio by which virtual memory usage of tasks may exceed physical memory.
The default value is 2.1
NOTES:The virtual memory usage of each task may exceed its physical memory limit by this ratio. The total amount of virtual memory used by tasks on the NodeManager may exceed its p
hysical memory usage by this ratio.
</description>
</property>
<property>
<name>yarn.nodemanager.local-dir</name>
<value>${hadoop.tmp.dir}/nm-local-dir</value>
<description>Comma-separated list of paths on the local filesystem where intermediate data is written.
The default value is "${hadoop.tmp.dir}/nm-local-dir"
NOTES:Multiple paths help spread disk i/o.
</description>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>${yarn.log.dir}/userlogs</value>
<description>Comma-separated list of paths on the local filesystem where logs are written
The default value is "${yarn.log.dir}/userlogs"
NOTES:Multiple paths help spread disk i/o.
</description>
</property>
<property>
<name>yarn.nodemanager.log.retain-seconds</name>
<value>10800</value>
<description>Default time (in seconds) to retain log files on the NodeManager Only applicable if log-aggregation is disabled.
The default value is "10800"
</description>
</property>
<property>
<name>yarn.application.classpath</name>
<value>$HADOOP_CLIENT_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/logs</value>
<description>HDFS directory where the application logs are moved on application completion. Need to set appropriate permissions. Only applicable if log-aggregation is enabled.
The default value is "/logs" or "/tmp/logs"
</description>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir-suffix</name>
<value>logs</value>
<description>Suffix appended to the remote log dir. Logs will be aggregated to ${yarn.nodemanager.remote-app-log-dir}/${user}/${thisParam} Only applicable if log-aggregation is enabled
.</description>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
<description>Shuffle service that needs to be set for Map Reduce applications.</description>
</property>
</configuration>
vim mapred-site.xml
<configuration>
<!--Configurations for MapReduce Applications:-->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>Execution framework set to Hadoop YARN.</description>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>512</value>
<description>Larger resource limit for maps.</description>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>Xmx1024M</value>
<description>Larger heap-size for child jvms of maps.</description>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>512</value>
<description>Larger resource limit for reduces.</description>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>Xmx1024M</value>
<description></description>
</property>
<property>
<name>mapreduce.task.io.sort.mb</name>
<value>512</value>
<description></description>
</property>
<property>
<name>mapreduce.task.io.sort.factor</name>
<value>10</value>
<description>More streams merged at once while sorting files.</description>
</property>
<property>
<name>mapreduce.reduce.shuffle.parallelcopies</name>
<value>5</value>
<description>Higher number of parallel copies run by reduces to fetch outputs from very large number of maps.</description>
</property>
<!--Configurations for MapReduce JobHistory Server:-->
<property>
<name>mapreduce.jobhistory.address</name>
<value>slave1:10020</value>
<description>MapReduce JobHistory Server host:port Default port is 10020</description>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>slave1:19888</value>
<description>MapReduce JobHistory Server Web UI host:port Default port is 19888</description>
</property>
<property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>/opt/module/hadoop-3.2.2/mr-history/tmp</value>
<description>Directory where history files are written by MapReduce jobs. Defalut is "/mr-history/tmp"</description>
</property>
<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>/opt/module/hadoop-3.2.2/mr-history/done</value>
<description>Directory where history files are managed by the MR JobHistory Server.Default value is "/mr-history/done"</description>
</property>
</configuration>
vim workers
master
slave1
slave2
9、初始化集群并启动
cd /opt/module/hadoop-3.2.2
hdfs namenode -format
[hadoop@master hadoop-3.2.2]$ sbin/start-dfs.sh
Starting namenodes on [master]
Starting datanodes
slave2: WARNING: /opt/module/hadoop-3.2.2/logs does not exist. Creating.
slave1: WARNING: /opt/module/hadoop-3.2.2/logs does not exist. Creating.
Starting secondary namenodes [slave1]
[hadoop@salve2 hadoop-3.2.2]$ sbin/start-yarn.sh
Starting resourcemanager
Starting nodemanagers
[hadoop@slave1 sbin]$ mr-jobhistory-daemon.sh start historyserver
WARNING: Use of this script to start the MR JobHistory daemon is deprecated.
WARNING: Attempting to execute replacement "mapred --daemon start" instead.
注意注意:启动服务的时候,需要在相应的主节点去启动。
启动情况:服务器上jps
网页查看:NameNode:9870
ReasourceManager:8088/cluster
JobHistory:19888
NodeManager:8042