Hadoop3.2.1安装-单机模式和伪分布式模式

Hadoop入门篇

概述

Hadoop是使用Java编写的,是为了解决大数据场景下的两大问题,分布式存储和分布式处理而诞生的,包含很多组件、套件。需要运行在Linux系统下。主要包括HDFS 和 MapReduce两个组件。

下载安装

下载

下载地址 https://archive.apache.org/dist/hadoop/common/

选择合适自己的tar.gz版本下载,该文档选择V3.2.1。

Hadoop是Java开发的,所以依赖jdk运行,要先安装jdk

Hadoop和jdk版本对应关系如下:

Hadoop版本jdk版本
>Hadoop3.3java8 or java11(runTime)
Hadoop3.0~Hadoop3.2java8
Hadoop2.7~Hadoop2.10java7 and java8

安装

Hadoop安装分为三种模式,单机模式、伪分布式模式,分布式模式。
单机模式主要是用来测试学习使用,底层使用的还是系统自带的文件系统。伪分布式和分布式模式底层使用Hdfs文件系统。

  • 单机模式安装

    将tar.gz包上传到Linux目录下解压,并将解压后目录变成hadoop。编辑./etc/hadoop/hadoop-env.sh文件,配置jdk路径

      # The java implementation to use. By default, this environment
      # variable is REQUIRED on ALL platforms except OS X!
      # export JAVA_HOME=
        export JAVA_HOME=/usr/java/jdk1.8.0_201-amd64
    

    执行如下命令测试Hadoop安装环境

      [root@k8s-node-107 hadoop]# bin/hadoop version
      Hadoop 3.2.1
      Source code repository Unknown -r 7a3bc90b05f257c8ace2f76d74264906f0f7a932
      Compiled by hexiaoqiao on 2021-01-03T09:26Z
      Compiled with protoc 2.5.0
      From source with checksum 5a8f564f46624254b27f6a33126ff4
      This command was run using /home/bigData/soft/hadoop/share/hadoop/common/hadoop-common-3.2.2.jar
    

    执行如下命令使用hadoop自带的计数案例测试hadoop运行情况(命令在安装根目录同级目录执行):

      mkdir input
      cp hadoop/etc/hadoop/*.xml input
      hadoop/bin/hadoop jar hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar grep input output 'dfs[a-z.]+'
      [root@localhost soft]# cat output/*
      1       dfsadmin
    

    思考:单机模式仅仅只是为了测试开发的jar包是否可用,在运行中使用到了MapReduce进行计算,但未使用到Hdfs.

  • 伪分布式模式

    hadoop伪分布式安装遇到的大多数问题来源于对Linux系统常用操作的不熟悉,比如新建用户、权限赋予、ssh免登陆设置

    1、 配置Hadoop相关环境变量(/etc/profile文件中加)

      # Hadoop Environment Variables
      export HADOOP_HOME=/home/bigData/soft/hadoop
      export HADOOP_INSTALL=$HADOOP_HOME
      export HADOOP_MAPRED_HOME=$HADOOP_HOME
      export HADOOP_COMMON_HOME=$HADOOP_HOME
      export HADOOP_HDFS_HOME=$HADOOP_HOME
      export YARN_HOME=$HADOOP_HOME
      export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
      export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
    

    2、配置启动用户

    由于hadoop伪分布式/分布式架构中主节点需要通过ssh去启动其他从节点,由于ssh默认是有密码的,所以需要设置ssh免密登录。但又不想破坏虚拟机原有用户ssh鉴权,所以会新建一个用户,这里新建用户为hadoop。脚本如下:

      adduser hadoop    //添加用户hadoop
      passwd hadoop     //设置密码
      id hadoop  // 查看hadoop用户组信息
      usermod -g root hadoop //将hadoop加入root超管权限组
      su hadoop         //切换到hadoop账号
      sudo chmod 777 -R hadoop  //更改hadoop目录权限
    

    3、修改sudoers文件

    切换到root账户,修改sudoers文件,不然无法在新账户里使用sudo命令,会报如下错误:

    hadoop is not in the sudoers file.  This incident will be reported.
    

    修改命令如下:

    [root@localhost hadoop]# chmod a+x /etc/sudoers
    [root@localhost hadoop]# vi /etc/sudoers   
    

    在文件sudoers中增加如下hadoop用户设置,保存时需要使用:wq!,要带!号,不然会显示只读文件无法修改

    #Allow root to run any commands anywhere
    root    ALL=(ALL)       ALL
    hadoop   ALL=(ALL)       ALL
    

    4、配置免密登录

    切换到hadoop用户后测试是否支持免密登录,输入ssh localhost,如下

      [hadoop@localhost soft]$ ssh localhost
      The authenticity of host 'localhost (::1)' can't be established.
      ECDSA key fingerprint is SHA256:cbS92o4o5+EzTyMUh93la2K25R2niIP10hRIMmh/zRA.
      ECDSA key fingerprint is MD5:d6:3b:b0:e7:6d:6f:b8:57:83:6c:db:9e:88:73:a8:e4.
      Are you sure you want to continue connecting (yes/no)? yes
      Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
      hadoop@localhost's password: 
      
      
      Permission denied, please try again.
      hadoop@localhost's password: 
    

    如上提示输入密码则不支持免密登录。下面是设置免密登录脚本:

      [hadoop@localhost soft]$ cd ~/.ssh/   # 若没有该目录,请先执行一次ssh localhost
      [hadoop@localhost .ssh]$ ssh-keygen -t rsa   # 出现输入提示,都按回车就可以
      Generating public/private rsa key pair.
      Enter file in which to save the key (/home/hadoop/.ssh/id_rsa): 
      Enter passphrase (empty for no passphrase): 
      Enter same passphrase again: 
      Your identification has been saved in /home/hadoop/.ssh/id_rsa.
      Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
      The key fingerprint is:
      SHA256:DptnAzC7LHbHldsrnr0aQrhIc157McjaLXvuM3D0CxQ hadoop@localhost.localdomain
      The key's randomart image is:
      +---[RSA 2048]----+
      |                 |
      |        E        |
      |    o    .       |
      |     * .o.       |
      |  o + BoS.       |
      | . * O.Xo=.      |
      |  + B XoX...     |
      | . o . B==..     |
      |      .=*=+.     |
      +----[SHA256]-----+
      [hadoop@localhost .ssh]$ ls
      id_rsa  id_rsa.pub  known_hosts
      [hadoop@localhost .ssh]$ cat id_rsa.pub >> authorized_keys   # 加入授权
      [hadoop@localhost .ssh]$ ls
      authorized_keys  id_rsa  id_rsa.pub  known_hosts
      [hadoop@localhost .ssh]$ chmod 600 ./authorized_keys    # 修改文件权限
    

再次尝试ssh localhost,如下则表示免密登录设置成功

[hadoop@localhost .ssh]$ ssh localhost
Last failed login: Thu Jul  1 17:32:19 CST 2021 from localhost on ssh:notty
There were 2 failed login attempts since the last successful login.
Last login: Thu Jul  1 17:31:21 2021

5、 修改4个关键配置文件

vi etc/hadoop/core-site.xml

		  <configuration>
		  <!-- 配置dataNode保存数据的位置  -->
		    <property>
		        <name>hadoop.tmp.dir</name>
		        <value>/home/bigData/soft/hadoop/tmp</value>
		        <description>Abase for other temporary directories.</description>
		    </property>
		  <!-- 配置HDFS的主节点,nameNode  -->
		    <property>
		        <name>fs.defaultFS</name>
		        <value>hdfs://localhost:39000</value>
		    </property>
		
		  </configuration>

vi etc/hadoop/hdfs-site.xml

	       <configuration>
	        <property>
		        <name>dfs.replication</name>
		        <value>1</value>
	        </property>
            <property>
		        <name>dfs.namenode.name.dir</name>
		        <value>/home/bigData/soft/hadoop/tmp/dfs/name</value>
	    	</property>
	    	<property>
		        <name>dfs.datanode.data.dir</name>
		        <value>/home/bigData/soft/hadoop/tmp/dfs/data</value>
		    </property>
		    <property>
			<name>dfs.namenode.http-address</name>
			<value>0.0.0.0:9870</value>
		    </property>

	       </configuration>

vi etc/hadoop/mapred-site.xml

		<configuration>
		
		 <property>
		      <name>mapreduce.framework.name</name>
		      <value>yarn</value>
		 </property>
		
		 <property>
		      <name>mapreduce.application.classpath</name>
		      <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
		 </property>
		
	    </configuration>

vi etc/hadoop/yarn-site.xml

      <configuration>
		  <!-- Site specific YARN configuration properties -->
		  <property>
		        <name>yarn.resourcemanger.hostname</name>
		        <value>localhost</value>
		  </property>
		  <property>
		        <name>yarme>yarn.nodemanager.env-whitelist</name>                           <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
		  </property>
		
		  <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
          </property>
	  </configuration>

6、HDFS初始化

[hadoop@localhost hadoop]# hdfs namenode -format
WARNING: /home/bigData/soft/hadoop/logs does not exist. Creating.
2021-06-25 12:55:08,910 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = localhost/127.0.0.1
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 3.2.2
      .....
      2021-06-25 12:55:10,799 INFO util.GSet: 0.029999999329447746% max memory 4.3 GB = 1.3 MB
2021-06-25 12:55:10,799 INFO util.GSet: capacity      = 2^17 = 131072 entries
2021-06-25 12:55:10,886 INFO namenode.FSImage: Allocated new BlockPoolId: BP-494110815-127.0.0.1-1624596910865
2021-06-25 12:55:10,902 INFO common.Storage: Storage directory /home/bigData/soft/hadoop/datanode/dfs/name has been successfully formatted.
2021-06-25 12:55:10,959 INFO namenode.FSImageFormatProtobuf: Saving image file /home/bigData/soft/hadoop/datanode/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
2021-06-25 12:55:11,100 INFO namenode.FSImageFormatProtobuf: Image file /home/bigData/soft/hadoop/datanode/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 399 bytes saved in 0 seconds .
2021-06-25 12:55:11,121 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
2021-06-25 12:55:11,129 INFO namenode.FSImage: FSImageSaver clean checkpoint: txid=0 when meet shutdown.
2021-06-25 12:55:11,129 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at localhost/127.0.0.1
************************************************************/

7、 修改hdfs启动、停止脚本

 修改start-dfs.sh,stop-dfs.sh脚本,在文件头部增加如下脚本:
    HDFS_DATANODE_USER=hadoop
	HADOOP_SECURE_DN_USER=hadoop
	HDFS_NAMENODE_USER=hadoop
	HDFS_SECONDARYNAMENODE_USER=hadoop

8、启动dfs

执行start-dfs.sh启动完成后,执行jps查看启动进程,如果有如下四个进程则启动成功:

[hadoop@localhost hadoop]$ jps
114497 Jps
113914 NameNode
114314 SecondaryNameNode
114044 DataNode
若namenode或者其他进程无法启动,一定要去logs下查看日志

浏览器输入http://ip:9870/访问hdfs管理界面,其中ip地址为部署服务器地址。

hdfs管理界面

关闭命令

stop-dfs.sh

9、启动yarn

执行start-yarn.sh启动完成后,执行jps查看启动进程,如果有如下NodeManager、ResourceManager进程则启动成功:

[hadoop@localhost hadoop]$ jps
113914 NameNode
114314 SecondaryNameNode
121629 NodeManager
114044 DataNode
121516 ResourceManager
121759 Jps

浏览器输入http://ip:8088/访问yarn管理界面,其中ip地址为部署服务器地址。

yarn管理界面

关闭命令

stop-yarn.sh

8、查看日志

 cd logs/
 tail -300f hadoop-hadoop-namenode-localhost.localdomain.log

 #export HADOOP_ROOT_LOGGER=DEBUG,console

9、 常见问题

Q1:按照以上流程跑起来后,jps查看,未发现namenode进程
A1: 进入到logs/目录查看namenode的日志,发现如下错误:

2021-07-05 16:23:01,750 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics system...
2021-07-05 16:23:01,750 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system stopped.
2021-07-05 16:23:01,751 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complete.
2021-07-05 16:23:01,761 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
java.net.BindException: Problem binding to [localhost:9000] java.net.BindException: Address already in use; For more details see:  http://wiki.apache.org/hadoop/BindException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
] java.net.BindException: Address already in use; For more details see:  http://wiki.apache.org/hadoop/BindException
	        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)

这问题明显是9000端口占用引起的冲突,修改端口值即可解决问题。由于没有日志意识,导致该问题困扰自己一下午。

ps:其它遇到的问题都在安装步骤里完善了,这里就不再赘述。

结论

Hadoop安装对Linux基础知识要求较高,所以它的学习门槛相对较高一些。本人也是亲自踩了若干坑,也是巩固了自己的基础知识,记录如上,欢迎大家来沟通交流。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
好的,下面是Hadoop安装教程-单机-分布式配置-Hadoop2.6.0-Ubuntu14.04详细教程教学: 1. 安装Java Hadoop是基于Java开发的,因此需要先安装Java。可以在终端中输入以下命令进行安装: ```bash sudo apt-get update sudo apt-get install default-jdk ``` 2. 下载Hadoop 可以在Hadoop官网下载Hadoop压缩包,也可以在终端中输入以下命令进行下载: ```bash wget https://archive.apache.org/dist/hadoop/core/hadoop-2.6.0/hadoop-2.6.0.tar.gz ``` 3. 解压Hadoop 在终端中输入以下命令解压Hadoop压缩包: ```bash tar -xzvf hadoop-2.6.0.tar.gz ``` 4. 配置Hadoop环境变量 打开终端中的/etc/environment文件,并在其中添加以下内容: ```bash export HADOOP_HOME=/path/to/hadoop export PATH=$PATH:$HADOOP_HOME/bin ``` 其中,/path/to/hadoopHadoop所在的路径。保存文件后,输入以下命令使配置生效: ```bash source /etc/environment ``` 5. 配置Hadoop 进入Hadoop目录,编辑etc/hadoop/hadoop-env.sh文件。找到以下行: ```bash export JAVA_HOME=${JAVA_HOME} ``` 将其修改为: ```bash export JAVA_HOME=/usr/lib/jvm/default-java ``` 保存文件并关闭。 6. 配置HDFS 进入etc/hadoop目录,编辑hdfs-site.xml文件。添加以下内容: ```xml <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/path/to/hadoop/data/dfs/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/path/to/hadoop/data/dfs/datanode</value> </property> </configuration> ``` 其中,/path/to/hadoop/data为Hadoop数据存储路径。 7. 配置YARN 进入etc/hadoop目录,编辑yarn-site.xml文件。添加以下内容: ```xml <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> </configuration> ``` 8. 配置MapReduce 进入etc/hadoop目录,编辑mapred-site.xml文件。添加以下内容: ```xml <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration> ``` 9. 格式化HDFS 在终端中输入以下命令格式化HDFS: ```bash hdfs namenode -format ``` 10. 启动Hadoop 在终端中输入以下命令启动Hadoop: ```bash start-all.sh ``` 11. 验证Hadoop 在浏览器中输入http://localhost:50070,即可访问Hadoop Web UI。在该页面中可以查看HDFS的状态、数据块、节点等信息。 以上就是Hadoop安装教程-单机-分布式配置-Hadoop2.6.0-Ubuntu14.04详细教程教学。希望对你有所帮助!

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值