大数据学习[02]:hadoop安装配置

hadoop

摘要:
主要基于三台机器之上的hadoop2.7.3的下载、安装,及相关参数配置,所遇问题,Demo等。其中配置,包含hadoop运行环境,yarn运行环境配置,目的是搭建成基于yarn之上的RM运行环境,另外,也对资源限制的情况下作了一个示范性的设置。

前置

  1. 有一个局域网集群,例如在虚拟机上搭建的那样[1]大数据学习前夕[01]:系统-网络-SSH
  2. 安装好JDK,及环境变量配置好;例如[2]大数据学习前夕[02]:JDK安装升级

下载

wget  http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz

解压

[hadoop@hadoop01 ~]$ tar -zxvf hadoop-2.7.3.tar.gz
[hadoop@hadoop01 ~]$ mv hadoop-2.7.3 hadoop

新文件目录

mkdir dfs  
mkdir dfs/name  
mkdir dfs/data  

配置

1. 修改hadoop-env.sh
[hadoop@hadoop01 ~]$ vim hadoop/etc/hadoop/hadoop-env.sh

hadoop
hadoop

2. yarn-env.sh
[hadoop@hadoop01 hadoop]$ vim /home/hadoop/hadoop/etc/hadoop/yarn-env.sh

配置JAVA_HOME,打开那个注解
export JAVA_HOME=/home/hadoop/jdk1.8.0_144

3. slaves

这里把hadoop03与hadoop02作为slaves,也可以把hadoop01加入来。

[hadoop@hadoop01 hadoop]$ vim /home/hadoop/hadoop/etc/hadoop/slaves

hadoop

4. core-site.xml
[hadoop@hadoop01 hadoop]$ vim /home/hadoop/hadoop/etc/hadoop/core-site.xml
<configuration>
     <property>
        <name>fs.defaultFS</name>
        <value>hdfs://haoop01:9000</value>
    </property>
    <property>
        <name>io.file.buffer.size</name>
        <value>131072</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>file:/home/hadoop/hadoop/tmp</value>
        <description>Abase for other temporary   directories.</description>
    </property>
</configuration>
5. hdfs-site.xml
[hadoop@hadoop01 hadoop]$ vim /home/hadoop/hadoop/etc/hadoop/hdfs-site.xml
<configuration>  
       <property>  
               <name>dfs.namenode.secondary.http-address</name>  
               <value>hadoop01:9001</value>  
       </property>  
     <property>  
             <name>dfs.namenode.name.dir</name>  
             <value>file:/home/hadoop/hadoop/dfs/name</value>  
       </property>  
      <property>  
              <name>dfs.datanode.data.dir</name>  
              <value>file:/home/hadoop/hadoop/dfs/data</value>  
       </property>  
       <property>  
               <name>dfs.replication</name>  
               <value>2</value>  
        </property>  
        <property>  
                  <name>dfs.webhdfs.enabled</name>  
                  <value>true</value>  
         </property>  
</configuration>  
6. mapred-site.xml


MR运行需要设置一下它的内存限制,要不跑MR程序时会卡死,后面的yarn也会设置这个内存及CPU的资源问题。

[hadoop@hadoop01 hadoop]$  vim /home/hadoop/hadoop/etc/hadoop/mapred-site.xml
<configuration>  
          <property>                                                                    
        <name>mapreduce.framework.name</name>  
                <value>yarn</value>  
           </property>  
          <property>  
                  <name>mapreduce.jobhistory.address</name>  
                  <value>haooop01:10020</value>  
          </property>  
          <property>  
                <name>mapreduce.jobhistory.webapp.address</name>  
                <value>hadoop01:19888</value>  
       </property>  
   <!-- 设置mr运行内存-->
    <property>
        <name>mapreduce.reduce.memory.mb</name>
        <value>512</value>
    </property>
    <property>
        <name>mapreduce.map.memory.mb</name>
        <value>512</value>
    </property>
</configuration>  
7. yarn-site.xml
[hadoop@hadoop01 hadoop]$  vim /home/hadoop/hadoop/etc/hadoop/yarn-site.xml


这里要包含设置内存及CPU的资源选项,除非机器的配置大过默认配置

<configuration>  
        <property>  
               <name>yarn.nodemanager.aux-services</name>  
               <value>mapreduce_shuffle</value>  
        </property>  
        <property>                                                                  
               <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>  
               <value>org.apache.hadoop.mapred.ShuffleHandler</value>  
        </property>  
        <property>  
               <name>yarn.resourcemanager.address</name>  
               <value>hadoop01:8032</value>  
       </property>  
       <property>  
               <name>yarn.resourcemanager.scheduler.address</name>  
               <value>hadoop01:8030</value>  
       </property>  
       <property>  
            <name>yarn.resourcemanager.resource-tracker.address</name>  
             <value>hadoop01:8031</value>  
      </property>  
      <property>  
              <name>yarn.resourcemanager.admin.address</name>  
               <value>hadoop01:8033</value>  
       </property>  
       <property>  
               <name>yarn.resourcemanager.webapp.address</name>  
               <value>hadoop01:8088</value>  
       </property>  

    <property>  
        <name>yarn.nodemanager.vmem-check-enabled</name>  
        <value>false</value>  
    </property> 

    <!--单个任务可申请的最少物理内存量,默认是2048(MB) -->
    <property>
        <name>yarn.scheduler.minimum-allocation-mb</name>
        <value>256</value>
    </property>   
    <!--单个任务可申请的最多物理内存量,默认是8192(MB) -->
    <property>
        <name>yarn.scheduler.maximum-allocation-mb</name>
        <value>512</value>
    </property>   
    <property>
        <name>yarn.app.mapreduce.am.resource.mb</name>
        <value>512</value>
        <description>The amount of memory the MR AppMaster needs.</description>
    </property>
    <property>
        <name>yarn.app.mapreduce.am.resource.cpu-vcores</name>
        <value>1</value>
    </property>  
    <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>512</value>
    </property>
    <property>
        <name>yarn.nodemanager.resource.cpu-vcores</name>
        <value>1</value>
    </property>
</configuration>  
8.复制配置文件

配置完之后,复制到基它的集群机器:

[hadoop@hadoop01 ~]$ scp -r hadoop hadoop@hadoop02:~/
[hadoop@hadoop01 ~]$ scp -r hadoop hadoop@hadoop03:~/
9.Hadoop环境变量
[hadoop@hadoop01 ~]$ sudo vim /etc/profile
#hadoop  
export HADOOP_HOME=/home/hadoop/hadoop 
export PATH=$PATH:$HADOOP_HOME/sbin  
export PATH=$PATH:$HADOOP_HOME/bin 
[hadoop@hadoop01 ~]$ source /etc/profile

格式化

[hadoop@hadoop01 hadoop]$ bin/hdfs namenode -format

启动

[hadoop@hadoop01 hadoop]$ sbin/start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [hadoop01]
hadoop01: starting namenode, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-namenode-hadoop01.out
hadoop02: starting datanode, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-datanode-hadoop02.out
hadoop03: starting datanode, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-datanode-hadoop03.out
Starting secondary namenodes [hadoop01]
hadoop01: starting secondarynamenode, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-secondarynamenode-hadoop01.out
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/hadoop/logs/yarn-hadoop-resourcemanager-hadoop01.out
hadoop03: starting nodemanager, logging to /home/hadoop/hadoop/logs/yarn-hadoop-nodemanager-hadoop03.out
hadoop02: starting nodemanager, logging to /home/hadoop/hadoop/logs/yarn-hadoop-nodemanager-hadoop02.out

hadoop01
hadoop
hadoop02
hadoop
hadoop03
hadoop

运行Demo

hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /data/input /data/output/result

web上查看
hadoop
结果
hadoop
提示:上面配置时,如果没有加入资源的限制,会一直卡在这里,不会运行;如果配置得不够会报错,例如
hadoop
原因是程序使用的程序已经超过了内存,还要设置一个参数:
在yarn-site.xml文件中加入[前面已加]

<property>  
      <name>yarn.nodemanager.vmem-check-enabled</name>  
      <value>false</value>  
</property> 

可能出现的基它异常

问题1,当访问hadoop文件系统时出现:

ls: Call From localhost/127.0.0.1 to hadoop01:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

问题2,当向fs上传文件时,即put操作时:
hadoop
出这些问题的原因会比较多,解决方法1:关闭防火墙

[hadoop@hadoop01 sbin]$ sudo service iptables stop
[hadoop@hadoop01 sbin]$ sudo  chkconfig iptables off

解决方法2:重新格式化
把hadoop/tmp全删除了,格式化 hadoop namenode -format
只启动:start-dfs.sh
查看报告

[hadoop@hadoop01 sbin]$ hadoop dfsadmin -report

hadoop
或采用WEB查看:
http://192.168.137.101:50070/dfshealth.html#tab-overview
hadoop

后记

配置到这里,hadoop完成了,只要是两个东西,一个是文件系统,一个MR分布式程序,这个配置过程中,用到了yarn来管理MR,可以解决MR存在大部分问题。

参考引用

[1] 大数据学习前夕[01]:系统-网络-SSH
[2] 大数据学习前夕[02]:JDK安装升级

【作者:happyprince】

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值