apache hadoop 3.2.0安装

一、环境准备

1、关闭防火墙

systemctl stop firewalld

systemctl disable firewalld.service

2、修改主机名和hosts文件

hostnamectl set-hostname master

vim /etc/hosts

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4

::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.159.132 master

192.168.159.133 slave1

192.168.159.134 slave2

3、更改selinux 为disabled

vi /etc/selinux/config

SELINUX=disabled

重启生效。(临时生效:setenforce 0)

4、安装并开启服务器http服务

yum install -y httpd

systemctl restart httpd.service

systemctl enable httpd.service

5、安装epel-release

yum install -y epel-release

6、创建需要的目录上传需要的包

mkdir /opt/module mkdir /opt/software

chown hadoop:hadooop -R /opt/*

7、卸载自带的openJDK并安装oracle-j2sdk1.8-1.8.0

rpm -qa | grep -i java | xargs -n1 rpm -e –nodeps

tar -zxvf jdk-8u180-linux-x64.tar.gz -C /user/

二、安装hadoop

1、解压

tar -zxvf hadoop hadoop-3.2.2.tar.gz -C /opt/module/

2、环境变量配置

vim /etc/profile

#JAVA HOME

export JAVA_HOME=/usr/java/jdk1.8.0_181

export PATH=$PATH:$JAVA_HOME/bin

#Hadoop

export HADOOP_HOME=/opt/module/hadoop-3.2.2

export PATH=$PATH:$HADOOP_HOME/bin

export PATH=$PATH:$HADOOP_HOME/sbin

source /etc/profile

3、测试hadoop、java安装是否成功

4、克隆服务器

5、配置克隆虚拟机的hosts文件和主机名

6、配置主机间hadoop用户ssh免密

[hadoop@master ~]$ ssh-keygen

[hadoop@master ~]$ ssh-copy-id -i .ssh/id_rsa.pub hadoop@slave1

[hadoop@master ~]$ ssh-copy-id -i .ssh/id_rsa.pub hadoop@slave2

[hadoop@master ~]$ ssh-copy-id -i .ssh/id_rsa.pub hadoop@master

7、集群规划

 

master

slave1

slave2

HDFS

NameNode

SecondaryNameNode

 

 

DataNode

DataNode

DataNode

YARM

 

JobHistory Server

Resourcemanager

 

NodeManager

NodeManager

NodeManager

8、配置文件修改(只在一台服务器上修改然后下发)

目录:/opt/module/hadoop-3.2.2/etc/Hadoop

vim hadoop-env.sh

添加:export JAVA_HOME=/usr/java/jdk1.8.0_181

vim core-site.xml

<!-- Put site-specific property overrides in this file. -->

<configuration>

  <!--Configurations for NameNode(SecondaryNameNode)、DataNode、NodeManager:-->

  <!-- 指定 NameNode 的地址 -->

  <property>

    <name>fs.defaultFS</name>

    <value>hdfs://master:9000</value>

    <description>NameNode URI</description>

  </property>

  <!-- 指定 hadoop 数据的存储目录 -->

  <property>

  <name>hadoop.tmp.dir</name>

  <value>/opt/module/hadoop-3.2.2/data</value>

  </property>

  <!-- 配置 HDFS 网页登录使用的静态用户为 hadoop -->

  <property>

  <name>hadoop.http.staticuser.user</name>

  <value>hadoop</value>

  </property>

  <property>

    <name>io.file.buffer.size</name>

    <value>131072</value>

    <description>Size of read/write buffer used in SequenceFiles,The default value is 131072</description>

  </property>

</configuration>

vim hdfs-site.xml

<configuration>

  <!--Configurations for NameNode:-->

  <!--nn 目录 -->

  <property>

    <name>dfs.namenode.name.dir</name>

    <value>file:///opt/module/hadoop-3.2.2/namenode</value>

  <!-- nn web 端访问地址-->

  <property>

  <name>dfs.namenode.http-address</name>

  <value>master:9870</value>

  </property>

  </property>

  <!-- 2nn web 端访问地址-->

  <property>

    <name>dfs.namenode.secondary.http-address</name>

    <value>slave1:50090</value>

  </property>

  <!-- 副本个数-->

  <property>

    <name>dfs.replication</name>

    <value>3</value>

  </property>

  <!-- 块大小-->

  <property>

    <name>dfs.blocksize</name>

    <value>268435456</value>

  </property>

  <property>

    <name>dfs.namenode.handler.count</name>

    <value>100</value>

  </property>

</configuration>

vim yarn-site.xml

<configuration>

  <!--Configurations for ResourceManager and NodeManager:-->

  <property>

    <name>yarn.acl.enable</name>

    <value>false</value>

    <description>Enable ACLs? Defaults to false. The value of the optional is "true" or "false"</description>

  </property>

        <property>

    <name>yarn.admin.acl</name>

    <value>*</value>

    <description>ACL to set admins on the cluster. ACLs are of for comma-separated-usersspacecomma-separated-groups. Defaults to special value of * which means anyone. Special value of jus

t space means no one has access</description>

  </property>

  <property>

    <name>yarn.log-aggregation-enable</name>

    <value>true</value>

    <description>Configuration to enable or disable log aggregation</description>

  </property>

  <!--Congrations for ResourceManager:-->

  <property>

    <name>yarn.resourcemanager.address</name>

    <value>slave2:8032</value>

    <description>ResourceManager host:port for clients to submit jobs.NOTES:host:port If set, overrides the hostname set in yarn.resourcemanager.hostname.</description>

  </property>

  <property>

    <name>yarn.resourcemanager.scheduler.address</name>

    <value>slave2:8030</value>

    <description>ResourceManager host:port for ApplicationMasters to talk to Scheduler to obtain resources.NOTES:host:port If set, overrides the hostname set in yarn.resourcemanager.hostna

me</description>

  </property>

  <property>

    <name>yarn.resourcemanager.resource-tracker.address</name>

    <value>slave2:8031</value>

    <description>ResourceManager host:port for NodeManagers.NOTES:host:port If set, overrides the hostname set in yarn.resourcemanager.hostname</description>

  </property>

  <property>

        <name>yarn.resourcemanager.admin.address</name>

        <value>slave2:8033</value>

        <description>ResourceManager host:port for administrative commands.NOTES:host:port If set, overrides the hostname set in yarn.resourcemanager.hostname.</description>

  </property>

  <property>

    <name>yarn.resourcemanager.webapp.address</name>

    <value>slave2:8088</value>

    <description>ResourceManager web-ui host:port. NOTES:host:port If set, overrides the hostname set in yarn.resourcemanager.hostname</description>

  </property>

  <property>

    <name>yarn.resourcemanager.hostname</name>

    <value>slave2</value>

    <description>ResourceManager host</description>

  </property>

  <property>

    <name>yarn.resourcemanager.scheduler.class</name>

 <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>

    <description>ResourceManager Scheduler class CapacityScheduler (recommended), FairScheduler (also recommended), or FifoScheduler.The default value is "org.apache.hadoop.yarn.server.res

ourcemanager.scheduler.capacity.CapacityScheduler".

    </description>

  </property>

  <property>

    <name>yarn.scheduler.minimum-allocation-mb</name>

    <value>512</value>

    <description>Minimum limit of memory to allocate to each container request at the Resource Manager.NOTES:In MBs</description>

  </property>

  <property>

    <name>yarn.scheduler.maximum-allocation-mb</name>

    <value>1024</value>

    <description>Maximum  limit of memory to allocate to each container request at the Resource Manager.NOTES:In MBs</description>

  </property>

  <!--Congrations for History Server:-->

  <property>

    <name>yarn.log-aggregation.retain-seconds</name>

    <value>-1</value>

    <description>How long to keep aggregation logs before deleting them. -1 disables. Be careful, set this too small and you will spam the name node.</description>

  </property>

  <property>

    <name>yarn.log-aggregation.retain-check-interval-seconds</name>

    <value>-1</value>

    <description>Time between checks for aggregated log retention. If set to 0 or a negative value then the value is computed as one-tenth of the aggregated log retention time. Be careful,

 set this too small and you will spam the name node.</description>

  </property>

  <!--Configurations for Configurations for NodeManager:-->

  <property>

    <name>yarn.nodemanager.resource.memory-mb</name>

    <value>1024</value>

    <description>Resource i.e. available physical memory, in MB, for given NodeManager.

        The default value is 8192.

        NOTES:Defines total available resources on the NodeManager to be made available to running containers

    </description>

  </property>

  <property>

    <name>yarn.nodemanager.vmem-pmem-ratio</name>

    <value>2.1</value>

    <description>Maximum ratio by which virtual memory usage of tasks may exceed physical memory.

        The default value is 2.1

        NOTES:The virtual memory usage of each task may exceed its physical memory limit by this ratio. The total amount of virtual memory used by tasks on the NodeManager may exceed its p

hysical memory usage by this ratio.

    </description>

  </property>

  <property>

    <name>yarn.nodemanager.local-dir</name>

    <value>${hadoop.tmp.dir}/nm-local-dir</value>

    <description>Comma-separated list of paths on the local filesystem where intermediate data is written.

        The default value is "${hadoop.tmp.dir}/nm-local-dir"

        NOTES:Multiple paths help spread disk i/o.

    </description>

  </property>

  <property>

    <name>yarn.nodemanager.log-dirs</name>

    <value>${yarn.log.dir}/userlogs</value>

    <description>Comma-separated list of paths on the local filesystem where logs are written

        The default value is "${yarn.log.dir}/userlogs"

        NOTES:Multiple paths help spread disk i/o.

        </description>

  </property>

   <property>

    <name>yarn.nodemanager.log.retain-seconds</name>

    <value>10800</value>

    <description>Default time (in seconds) to retain log files on the NodeManager Only applicable if log-aggregation is disabled.

        The default value is "10800"

    </description>

  </property>

  <property>

    <name>yarn.application.classpath</name>

    <value>$HADOOP_CLIENT_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*</value>

  </property>

   <property>

    <name>yarn.nodemanager.remote-app-log-dir</name>

    <value>/logs</value>

    <description>HDFS directory where the application logs are moved on application completion. Need to set appropriate permissions. Only applicable if log-aggregation is enabled.

    The default value is "/logs" or "/tmp/logs"

    </description>

  </property>

   <property>

    <name>yarn.nodemanager.remote-app-log-dir-suffix</name>

    <value>logs</value>

    <description>Suffix appended to the remote log dir. Logs will be aggregated to ${yarn.nodemanager.remote-app-log-dir}/${user}/${thisParam} Only applicable if log-aggregation is enabled

.</description>

  </property>

   <property>

    <name>yarn.nodemanager.aux-services</name>

    <value>mapreduce_shuffle</value>

    <description>Shuffle service that needs to be set for Map Reduce applications.</description>

  </property>

</configuration>

vim mapred-site.xml

<configuration>

  <!--Configurations for MapReduce Applications:-->

  <property>

    <name>mapreduce.framework.name</name>

    <value>yarn</value>

    <description>Execution framework set to Hadoop YARN.</description>

  </property>

  <property>

    <name>mapreduce.map.memory.mb</name>

    <value>512</value>

    <description>Larger resource limit for maps.</description>

  </property>

  <property>

    <name>mapreduce.map.java.opts</name>

    <value>Xmx1024M</value>

    <description>Larger heap-size for child jvms of maps.</description>

  </property>

  <property>

    <name>mapreduce.reduce.memory.mb</name>

    <value>512</value>

    <description>Larger resource limit for reduces.</description>

  </property>

  <property>

    <name>mapreduce.reduce.java.opts</name>

    <value>Xmx1024M</value>

    <description></description>

  </property>

  <property>

    <name>mapreduce.task.io.sort.mb</name>

    <value>512</value>

    <description></description>

  </property>

  <property>

    <name>mapreduce.task.io.sort.factor</name>

    <value>10</value>

    <description>More streams merged at once while sorting files.</description>

  </property>

  <property>

    <name>mapreduce.reduce.shuffle.parallelcopies</name>

    <value>5</value>

    <description>Higher number of parallel copies run by reduces to fetch outputs from very large number of maps.</description>

  </property>

  <!--Configurations for MapReduce JobHistory Server:-->

        <property>

    <name>mapreduce.jobhistory.address</name>

    <value>slave1:10020</value>

    <description>MapReduce JobHistory Server host:port Default port is 10020</description>

  </property>

  <property>

    <name>mapreduce.jobhistory.webapp.address</name>

    <value>slave1:19888</value>

    <description>MapReduce JobHistory Server Web UI host:port Default port is 19888</description>

  </property>

  <property>

    <name>mapreduce.jobhistory.intermediate-done-dir</name>

    <value>/opt/module/hadoop-3.2.2/mr-history/tmp</value>

    <description>Directory where history files are written by MapReduce jobs. Defalut is "/mr-history/tmp"</description>

  </property>

  <property>

    <name>mapreduce.jobhistory.done-dir</name>

    <value>/opt/module/hadoop-3.2.2/mr-history/done</value>

    <description>Directory where history files are managed by the MR JobHistory Server.Default value is "/mr-history/done"</description>

  </property>

</configuration>

vim workers

master

slave1

slave2

9、初始化集群并启动

cd /opt/module/hadoop-3.2.2

hdfs namenode -format

[hadoop@master hadoop-3.2.2]$ sbin/start-dfs.sh

Starting namenodes on [master]

Starting datanodes

slave2: WARNING: /opt/module/hadoop-3.2.2/logs does not exist. Creating.

slave1: WARNING: /opt/module/hadoop-3.2.2/logs does not exist. Creating.

Starting secondary namenodes [slave1]

[hadoop@salve2 hadoop-3.2.2]$ sbin/start-yarn.sh

Starting resourcemanager

Starting nodemanagers

[hadoop@slave1 sbin]$ mr-jobhistory-daemon.sh start historyserver

WARNING: Use of this script to start the MR JobHistory daemon is deprecated.

WARNING: Attempting to execute replacement "mapred --daemon start" instead.

注意注意:启动服务的时候,需要在相应的主节点去启动。

启动情况:服务器上jps

网页查看:NameNode:9870

ReasourceManager:8088/cluster

JobHistory:19888

NodeManager:8042

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值