Hadoop伪分布式部署及常用操作

hadoop伪分布式部署

之前有做过hadoop 2.x的部署,现在再做一下hadoop 3.x的部署。

hadoop有三个组件:hdfs用来存储数据,mapreduce 用来计算(作业) ,yarn用来资源(cpu memory)和作业调度 。

其实hadoop官方网站都有部署的步骤:
在这里插入图片描述

https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html

下面根据官网这个来进行伪分布式部署。

1. 可支持的平台

在这里插入图片描述
hadoop可支持的平台:GNU/Linux、Windows。hadoop在GNU/Linux平台上可支持到2000个节点集群。
在这里是部署在Linux机器上。

2.必要的软件

在这里插入图片描述
Java必须要安装,但是hadoop官网有提到有些Java版本有bug,不推荐用,所以如果你要部署时最好先看下哪些Java版本不能用,里面出现bug的原因是什么。
Java的安装需要下载Java8的安装包,然后解压到/usr/java/目录,chown修改属主,配置环境变量,生效。
在这里插入图片描述

ssh必须要安装,因为hadoop的一些脚本通过ssh管理远程的hadoop进程,可以which ssh看下是否有ssh,没有则需要安装一下。

3.创建一个部署hadoop的用户

[root@hadoop001 ~]# useradd hadoop
[root@hadoop001 ~]# id hadoop
uid=1000(hadoop) gid=1000(hadoop) groups=1000(hadoop)
[root@hadoop001 ~]# su - hadoop
[hadoop@hadoop001 ~]$ mkdir sourcecode software app log data lib tmp

4.配置ssh无密码访问

看官网:
在这里插入图片描述

可以先用ssh连本机进行验证有密码访问。
然后配置无密码访问:

#回车三次
[hadoop@hadoop001 ~]$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa): 
Created directory '/home/hadoop/.ssh'.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
91:35:29:64:ad:91:28:5d:6b:fa:60:06:b2:90:d0:f5 hadoop@hadoop001
The key's randomart image is:
+--[ RSA 2048]----+
|.. ... ++oo.     |
|... ..o.++o.     |
|o . ..E =+       |
| . o . o..       |
|  .   = S        |
|     o o         |
|        .        |
|                 |
|                 |
+-----------------+

[hadoop@hadoop001 .ssh]$ cd .ssh
[hadoop@hadoop001 .ssh]$ ll -a
total 12
drwx------.  2 hadoop hadoop   36 Dec  2 16:46 .
drwx------. 10 hadoop hadoop 4096 Dec  2 16:46 ..
-rw-------.  1 hadoop hadoop 1675 Dec  2 16:46 id_rsa
-rw-r--r--.  1 hadoop hadoop  397 Dec  2 16:46 id_rsa.pub

[hadoop@hadoop001 .ssh]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

[hadoop@hadoop001 .ssh]$ ll
total 12
-rw-rw-r--. 1 hadoop hadoop  397 Dec  2 16:46 authorized_keys
-rw-------. 1 hadoop hadoop 1675 Dec  2 16:46 id_rsa
-rw-r--r--. 1 hadoop hadoop  397 Dec  2 16:46 id_rsa.pub

[hadoop@hadoop001 .ssh]$ chmod 0600 ~/.ssh/authorized_keys

[hadoop@hadoop001 .ssh]$ ll
total 12
-rw-------. 1 hadoop hadoop  397 Dec  2 16:46 authorized_keys
-rw-------. 1 hadoop hadoop 1675 Dec  2 16:46 id_rsa
-rw-r--r--. 1 hadoop hadoop  397 Dec  2 16:46 id_rsa.pub
[hadoop@hadoop001 .ssh]$ 
[hadoop@hadoop001 .ssh]$ ssh hadoop001
The authenticity of host 'hadoop001 (192.168.14.128)' can't be established.
ECDSA key fingerprint is f9:95:75:87:df:44:35:2b:8e:0f:dc:eb:87:1b:57:ec.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'hadoop001,192.168.14.128' (ECDSA) to the list of known hosts.
Last login: Thu Dec  2 15:37:15 2021
[hadoop@hadoop001 ~]$  exit
logout
Connection to hadoop001 closed.
[hadoop@hadoop001 .ssh]$ 

5.下载hadoop安装包并配置

在这里插入图片描述
下载部署:

[hadoop@hadoop001 ~]$  cd software
[hadoop@hadoop001 software]$  wget https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.2.2/hadoop-3.2.2.tar.gz
[hadoop@hadoop001 software]$ tar -xzvf hadoop-3.2.2.tar.gz -C ../app/
[hadoop@hadoop001 software]$ cd ../app
[hadoop@hadoop001 app]$  ln -s hadoop-3.2.2 hadoop

配置文件:hadoop-env.sh

然后需要配置etc/hadoop/hadoop-env.sh 文件,在里面增加JAVA_HOME环境变量
在这里插入图片描述

[hadoop@hadoop001 hadoop]$ pwd
/home/hadoop/app/hadoop/etc/hadoop
[hadoop@hadoop001 hadoop]$ 
[hadoop@hadoop001 hadoop]$ vi hadoop-env.sh
//编辑增加:
export JAVA_HOME=/usr/java/jdk1.8.0_45
//然后保存

hadoop集群有三种模式:
在这里插入图片描述
本地模式 ,不启动进程,很少用;
伪分布式 , 启动进程 单个进程(老大 小弟), 学习用;
集群模式 , 启动多个进程(2个老大,多个小弟), 生产用。
这里用的是伪分布式。

配置文件core-site.xml

伪分布式模式,需要etc/hadoop/core-site.xml文件。

[hadoop@hadoop001 hadoop]$ pwd
/home/hadoop/app/hadoop/etc/hadoop
[hadoop@hadoop001 hadoop]$ 
[hadoop@hadoop001 hadoop]$ vi core-site.xml
//编辑增加:
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://hadoop001:9000</value>
    </property>
</configuration>
//然后保存

其中hadoop001是机器的名称,配置这个意思是namenode是以hadoop001上进去启动的。

配置文件hdfs-site.xml

伪分布式模式,需要etc/hadoop/hdfs-site.xml文件。

[hadoop@hadoop001 hadoop]$ pwd
/home/hadoop/app/hadoop/etc/hadoop
[hadoop@hadoop001 hadoop]$ 
[hadoop@hadoop001 hadoop]$ vi hdfs-site.xml
//编辑增加:
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>
//然后保存

配置这个是配置hdfs上副本的个数为1。

配置hdfs的三个进程都以hadoop001机器名称启动
配置文件 /etc/hosts

这个文件里配置下本机器ip本机器名的映射,
配置这个是要让namenode、secondarynamenode、datanode都以hadoop001启动
如下:

[root@hadoop001 ~]# vi /etc/hosts

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.14.128 hadoop001
配置文件 core-site.xml

上面已经配置,fs.defaultFS:hdfs://hadoop001:9000
让namenode以hadoop001启动。

配置文件workers

etc/hadoop/workers(hadoop 2.x这个文件应该是salves)

[hadoop@hadoop001 hadoop]$ pwd
/home/hadoop/app/hadoop/etc/hadoop
[hadoop@hadoop001 hadoop]$ 
[hadoop@hadoop001 hadoop]$ vi workers 
hadoop001
//原来是localhost,需要修改成机器名

让datanode以hadoop001启动。

配置文件hdfs-site.xml

配置这个是让secondarynamenode 以hadoop001启动:

[hadoop@hadoop001 hadoop]$ pwd
/home/hadoop/app/hadoop/etc/hadoop
[hadoop@hadoop001 hadoop]$ 
[hadoop@hadoop001 hadoop]$ vi hdfs-site.xml
//编辑增加:
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>hadoop001:9868</value>
    </property>
       <property>
        <name>dfs.namenode.secondary.https-address</name>
        <value>hadoop001:9869</value>
    </property>

</configuration>
//然后保存

这个配置可以在官网的默认配置文件中可以找到:
在这里插入图片描述
以上配置是为了让老大、老二和小弟都以同样的机器名启动,而不是一个以hadoop001启动,一个以localhost启动。
这样有哥好处,就是以后如果机器ip变化了,只要修改hosts文件的ip就可以了。

hadoop配置文件关于/tmp目录的修改

默认情况下,hdfs的数据和三个进程pid都存在 /tmp/ 目录下,如果30天内文件不被访问,/tmp/目录下该文件会被清掉。比如说三个进程的pid文件被删除,那么如果对hdfs进行重启或者停止三个进程,则找不到对应的pid,就停止不了,再重新启动三个进程,你以为重启了,实际上并没有重启,依旧是之前的配置,如果修改了hdfs的配置,则重启不会生效。
另外数据存储目录也在/tmp很危险
所以需要修改对应的配置文件的/tmp/目录为其他目录。
现在把/tmp/目录修改成另外的目录如:/home/hadoop/tmp

看下官网的默认配置:
core-default.xml:
在这里插入图片描述
hdfs-default.xml:
dfs.namenode.name.dir file://${hadoop.tmp.dir}/dfs/name
dfs.datanode.data.dir file://${hadoop.tmp.dir}/dfs/data
dfs.namenode.checkpoint.dir file://${hadoop.tmp.dir}/dfs/namesecondary
在这里插入图片描述
看上面默认配置说明,namenode、secondarynamenode、datanode的数据目录都是在/tmp目录,所以需要配置下core-site.xml文件:
/home/hadoop/app/hadoop/etc/hadoop/core-site.xml
编辑增加如下内容:

    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/hadoop/tmp/hadoop-${user.name}</value>
    </property>
  #注意不是/home/hadoop/tmp,后面还有hadoop-${user.name}

另外,pid文件所在目录的修改,在什么地方呢?
在配置文件hadoop-env.sh里面
编辑/home/hadoop/app/hadoop/etc/hadoop/hadoop-env.sh 文件,
找到如下两行:

# Where pid files are stored.  /tmp by default.
# export HADOOP_PID_DIR=/tmp

修改成:

# Where pid files are stored.  /tmp by default.
 export HADOOP_PID_DIR=/home/hadoop/tmp

另外说明一下,上面是伪分布式部署,生产上,节点肯定不止一个,所以
hdfs-default.xml:
dfs.namenode.name.dir file://${hadoop.tmp.dir}/dfs/name
dfs.datanode.data.dir file://${hadoop.tmp.dir}/dfs/data
dfs.namenode.checkpoint.dir file://${hadoop.tmp.dir}/dfs/namesecondary
特别是dfs.datanode.data.dir 肯定有多个,比如说有10个刀片服务器,挂了10块物理磁盘,总共5T空间
那么就是这样的:
dfs.datanode.data.dir : /data01/dfs/dn,/data02/dfs/dn,/data03/dfs/dn…

如果说一块磁盘写的能力30M/s ,10块磁盘 就是300M/s
多块磁盘是为了存储空间更大,且高效率的读写IO。 肯定比单块磁盘更快。
所以生产上DN的数据目录参数,必然不能默认使用${hadoop.tmp.dir},需要根据自己实际情况写清楚。

6.namenode格式化

执行命令 bin/hdfs namenode -format ,进行文件系统格式化操作:

[hadoop@hadoop001 hadoop]$ pwd
/home/hadoop/app/hadoop
[hadoop@hadoop001 hadoop]$ bin/hdfs namenode -format
WARNING: /home/hadoop/app/hadoop-3.2.2/logs does not exist. Creating.
2021-12-02 22:59:25,107 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = hadoop001/192.168.14.128
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 3.2.2
.....
2021-12-02 22:59:25,905 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1643070568-192.168.14.128-1638457165899
2021-12-02 22:59:25,916 INFO common.Storage: Storage directory /home/hadoop/tmp/hadoop-hadoop/dfs/name has been successfully formatted.
2021-12-02 22:59:25,933 INFO namenode.FSImageFormatProtobuf: Saving image file /home/hadoop/tmp/hadoop-hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
2021-12-02 22:59:25,992 INFO namenode.FSImageFormatProtobuf: Image file /home/hadoop/tmp/hadoop-hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 400 bytes saved in 0 seconds .
2021-12-02 22:59:26,000 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
2021-12-02 22:59:26,006 INFO namenode.FSImage: FSImageSaver clean checkpoint: txid=0 when meet shutdown.
2021-12-02 22:59:26,006 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop001/192.168.14.128
************************************************************/
[hadoop@hadoop001 hadoop]$ 

然后:对NameNode daemon and DataNode daemon进行启动,执行sbin/start-dfs.sh:

[hadoop@hadoop001 hadoop]$ pwd
/home/hadoop/app/hadoop
[hadoop@hadoop001 hadoop]$ sbin/start-dfs.sh
Starting namenodes on [hadoop001]
Starting datanodes
Starting secondary namenodes [hadoop001]
[hadoop@hadoop001 hadoop]$

可以去这个地方查看相关日志:

[hadoop@hadoop001 logs]$ pwd
/home/hadoop/app/hadoop/logs
[hadoop@hadoop001 logs]$  ll
total 120
-rw-rw-r--. 1 hadoop hadoop 33392 Dec  2 23:04 hadoop-hadoop-datanode-hadoop001.log
-rw-rw-r--. 1 hadoop hadoop   691 Dec  2 23:04 hadoop-hadoop-datanode-hadoop001.out
-rw-rw-r--. 1 hadoop hadoop 38007 Dec  2 23:04 hadoop-hadoop-namenode-hadoop001.log
-rw-rw-r--. 1 hadoop hadoop   691 Dec  2 23:04 hadoop-hadoop-namenode-hadoop001.out
-rw-rw-r--. 1 hadoop hadoop 30465 Dec  2 23:04 hadoop-hadoop-secondarynamenode-hadoop001.log
-rw-rw-r--. 1 hadoop hadoop   691 Dec  2 23:04 hadoop-hadoop-secondarynamenode-hadoop001.out
-rw-rw-r--. 1 hadoop hadoop     0 Dec  2 22:59 SecurityAuth-hadoop.audit

启动如果没有问题,可以在网页上访问hdfs,网址是为:http://localhost:9870/
需要把localhost修改为相应的ip。
以前hadoop 2.X 版本,访问web界面,hdfs端口号是50070,现在3.X版本,端口号是9870
如下:
在这里插入图片描述

7.hdfs和mapreduce验证

验证如下:

[hadoop@hadoop001 hadoop]$ pwd
/home/hadoop/app/hadoop
[hadoop@hadoop001 hadoop]$ bin/hdfs dfs -mkdir /user
[hadoop@hadoop001 hadoop]$ 
[hadoop@hadoop001 hadoop]$ bin/hdfs dfs -mkdir /user/hadoop
[hadoop@hadoop001 hadoop]$ bin/hdfs dfs -mkdir input
#这里需要注意,直接写 'input' 不写hdfs具体路径,它创建的是 ‘/user/<username>}’ 目录下, 
[hadoop@hadoop001 hadoop]$ bin/hdfs dfs -put etc/hadoop/*.xml input
[hadoop@hadoop001 hadoop]$ bin/hdfs dfs -ls /
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2021-12-02 23:20 /user
[hadoop@hadoop001 hadoop]$ bin/hdfs dfs -ls /user/
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2021-12-02 23:20 /user/hadoop
[hadoop@hadoop001 hadoop]$ bin/hdfs dfs -ls /user/hadoop/
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2021-12-02 23:21 /user/hadoop/input
[hadoop@hadoop001 hadoop]$ bin/hdfs dfs -ls /user/hadoop/input/
Found 9 items
-rw-r--r--   1 hadoop supergroup       9213 2021-12-02 23:21 /user/hadoop/input/capacity-scheduler.xml
-rw-r--r--   1 hadoop supergroup       1012 2021-12-02 23:21 /user/hadoop/input/core-site.xml
-rw-r--r--   1 hadoop supergroup      11392 2021-12-02 23:21 /user/hadoop/input/hadoop-policy.xml
-rw-r--r--   1 hadoop supergroup       1124 2021-12-02 23:21 /user/hadoop/input/hdfs-site.xml
-rw-r--r--   1 hadoop supergroup        620 2021-12-02 23:21 /user/hadoop/input/httpfs-site.xml
-rw-r--r--   1 hadoop supergroup       3518 2021-12-02 23:21 /user/hadoop/input/kms-acls.xml
-rw-r--r--   1 hadoop supergroup        682 2021-12-02 23:21 /user/hadoop/input/kms-site.xml
-rw-r--r--   1 hadoop supergroup        758 2021-12-02 23:21 /user/hadoop/input/mapred-site.xml
-rw-r--r--   1 hadoop supergroup        690 2021-12-02 23:21 /user/hadoop/input/yarn-site.xml
[hadoop@hadoop001 hadoop]$
[hadoop@hadoop001 hadoop]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.2.jar grep input output 'dfs[a-z.]+'
......
[hadoop@hadoop001 hadoop]$
[hadoop@hadoop001 hadoop]$ bin/hdfs dfs -ls /user/hadoop/
Found 2 items
drwxr-xr-x   - hadoop supergroup          0 2021-12-02 23:21 /user/hadoop/input
drwxr-xr-x   - hadoop supergroup          0 2021-12-02 23:24 /user/hadoop/output
[hadoop@hadoop001 hadoop]$ bin/hdfs dfs -ls /user/hadoop/output/
Found 2 items
-rw-r--r--   1 hadoop supergroup          0 2021-12-02 23:24 /user/hadoop/output/_SUCCESS
-rw-r--r--   1 hadoop supergroup         90 2021-12-02 23:24 /user/hadoop/output/part-r-00000
[hadoop@hadoop001 hadoop]$ bin/hdfs dfs -cat /user/hadoop/output/*
1	dfsadmin
1	dfs.replication
1	dfs.namenode.secondary.https
1	dfs.namenode.secondary.http
[hadoop@hadoop001 hadoop]$ 

查看hdfs三个进程是否启动以及停止进程,命令如下:

[hadoop@hadoop001 hadoop]$ jps
6017 Jps
4860 SecondaryNameNode
4525 NameNode
4671 DataNode
[hadoop@hadoop001 hadoop]$ 
[hadoop@hadoop001 hadoop]$ ps -ef|grep hadoop
hadoop      4525      1  0 23:04 ?        00:00:10 /usr/java/jdk1.8.0_45/bin/java -Dproc_namenode -Djava.net.preferIPv4Stack=true -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dyarn.log.dir=/home/hadoop/app/hadoop-3.2.2/logs -Dyarn.log.file=hadoop-hadoop-namenode-hadoop001.log -Dyarn.home.dir=/home/hadoop/app/hadoop-3.2.2 -Dyarn.root.logger=INFO,console -Djava.library.path=/home/hadoop/app/hadoop-3.2.2/lib/native -Dhadoop.log.dir=/home/hadoop/app/hadoop-3.2.2/logs -Dhadoop.log.file=hadoop-hadoop-namenode-hadoop001.log -Dhadoop.home.dir=/home/hadoop/app/hadoop-3.2.2 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,RFA -Dhadoop.policy.file=hadoop-policy.xml org.apache.hadoop.hdfs.server.namenode.NameNode
hadoop      4671      1  0 23:04 ?        00:00:09 /usr/java/jdk1.8.0_45/bin/java -Dproc_datanode -Djava.net.preferIPv4Stack=true -Dhadoop.security.logger=ERROR,RFAS -Dyarn.log.dir=/home/hadoop/app/hadoop-3.2.2/logs -Dyarn.log.file=hadoop-hadoop-datanode-hadoop001.log -Dyarn.home.dir=/home/hadoop/app/hadoo-3.2.2 -Dyarn.root.logger=INFO,console -Djava.library.path=/home/hadoop/app/hadoop-3.2.2/lib/native -Dhadoop.log.dir=/home/hadoop/app/hadoop-3.2.2/logs -hadoop.log.file=hadoop-hadoop-datanode-hadoop001.log -Dhadoop.home.dir=/home/hadoop/app/hadoop-3.2.2 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,RFA -Dhadoop.policy.file=hadoop-policy.xml org.apache.hadoop.hdfs.server.datanode.DataNode
hadoop      4860      1  0 23:04 ?        00:00:05 /usr/java/jdk1.8.0_45/bin/java -Dproc_secondarynamenode -Djava.net.preferIPv4Stack=true -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dyarn.log.dir=/home/hadoop/app/hadoop-3.2.2/logs -Dyarn.log.file=hadoop-hadoop-secondarynamenode-hadoop001.log -Dyarn.home.dir=/home/hadoop/app/hadoop-3.2.2 -Dyarn.root.logger=INFO,console -Djava.library.path=/home/hadoop/app/hadoop-3.2.2/lib/native -Dhadoop.log.dir=/home/hadoop/app/hadoop-3.2.2/logs -Dhadoop.log.file=hadoop-hadoop-secondarynamenode-hadoop001.log -Dhadoop.home.dir=/home/hadoop/app/hadoop-3.2.2 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,RFA -Dhadoop.policy.file=hadoop-policy.xml org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode
hadoop      6033   4057  0 23:29 pts/2    00:00:00 grep --color=auto hadoop
[hadoop@hadoop001 hadoop]$ 
[hadoop@hadoop001 hadoop]$ sbin/stop-dfs.sh
Stopping namenodes on [hadoop001]
Stopping datanodes
Stopping secondary namenodes [hadoop001]
[hadoop@hadoop001 hadoop]$ 
[hadoop@hadoop001 hadoop]$ jps
6523 Jps
[hadoop@hadoop001 hadoop]$

以上算是把hdfs部署完成。

8.YARN的部署

You can run a MapReduce job on YARN in a pseudo-distributed mode by setting a few parameters and running ResourceManager daemon and NodeManager daemon in addition.
在伪分布式模式上,你可以通过几个参数设置并且运行ResourceManager和NodeManager 进程 ,把mapreduce的任务运行在yarn上面。
下面进行设置参数。
编辑文件:etc/hadoop/mapred-site.xml,添加如下内容:

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.application.classpath</name>
        <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
    </property>
</configuration>

编辑文件:etc/hadoop/yarn-site.xml,添加如下内容:

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.env-whitelist</name>
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME</value>
    </property>
</configuration>

然后通过sbin/start-yarn.sh启动yarn的两个进程:
如下:

[hadoop@hadoop001 hadoop]$ vi etc/hadoop/mapred-site.xml
[hadoop@hadoop001 hadoop]$ vi etc/hadoop/yarn-site.xml
[hadoop@hadoop001 hadoop]$ sbin/start-dfs.sh
Starting namenodes on [hadoop001]
Starting datanodes
Starting secondary namenodes [hadoop001]
[hadoop@hadoop001 hadoop]$ jps
6713 NameNode
6861 DataNode
7181 Jps
7054 SecondaryNameNode
[hadoop@hadoop001 hadoop]$ 
[hadoop@hadoop001 hadoop]$ sbin/start-yarn.sh
Starting resourcemanager
Starting nodemanagers
[hadoop@hadoop001 hadoop]$ jps
7457 NodeManager
7777 Jps
7316 ResourceManager
6713 NameNode
6861 DataNode
7054 SecondaryNameNode
[hadoop@hadoop001 hadoop]$ 

然后可以通过web界面访问yarn的ResourceManager端:
地址是: http://localhost:8088/
localhost这个需要修改成对应的ip地址。
在这里插入图片描述
我的是虚拟机,所以直接可以看了,如果是云主机,则需要去你买的云主机网站上开放对应的端口号。
云主机的话,8088端口容易被挖矿,比如:https://segmentfault.com/a/1190000015264170
可以修改yarn的web界面端口号,修改yarn-site.xml文件,在文件中添加如下内容(有的话进行修改):

 <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>hadoop001:8123</value>
</property>

这些配置,在官网上面的默认配置文件中都可以找到相关的参数。
然后重启一下yarn进程。

环境变量配置

配置下hadoop的环境变量:

#在~/ .bashrc中增加:
#export HADOOP_HOME=/home/hadoop/app/hadoop
#export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

[hadoop@hadoop001 ~]$ vi .bashrc
[hadoop@hadoop001 ~]$ 
[hadoop@hadoop001 ~]$ source .bashrc 
[hadoop@hadoop001 ~]$ which hdfs
~/app/hadoop/bin/hdfs
[hadoop@hadoop001 ~]$ which yarn
~/app/hadoop/bin/yarn
[hadoop@hadoop001 ~]$ which hadoop
~/app/hadoop/bin/hadoop
[hadoop@hadoop001 ~]$ 

yarn的验证

Run a MapReduce job。
可以用上面验证hdfs的方法再验证,这里用wordcount案例来验证一下。


[hadoop@hadoop001 ~]$ vi wordcount.txt
word count
zhangsan lisi
word
word zhangsan happy hadppy
[hadoop@hadoop001 ~]$ 
[hadoop@hadoop001 ~]$ pwd
/home/hadoop
[hadoop@hadoop001 ~]$ hdfs dfs -mkdir /input
[hadoop@hadoop001 ~]$ hdfs dfs -ls /
Found 2 items
drwxr-xr-x   - hadoop supergroup          0 2021-12-03 00:29 /input
drwxr-xr-x   - hadoop supergroup          0 2021-12-02 23:20 /user
[hadoop@hadoop001 ~]$ hdfs dfs -put wordcount.txt /input
[hadoop@hadoop001 ~]$ hdfs dfs -ls /input/
Found 1 items
-rw-r--r--   1 hadoop supergroup         58 2021-12-03 00:30 /input/wordcount.txt
[hadoop@hadoop001 ~]$ 
[hadoop@hadoop001 ~]$ hdfs dfs -cat /input/wordcount.txt
word count
zhangsan lisi
word
word zhangsan happy hadppy

[hadoop@hadoop001 ~]$ find ./ -name *example*.jar
./app/hadoop-3.2.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.2.jar
./app/hadoop-3.2.2/share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-3.2.2-test-sources.jar
./app/hadoop-3.2.2/share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-3.2.2-sources.jar
[hadoop@hadoop001 ~]$ 
[hadoop@hadoop001 ~]$ 
[hadoop@hadoop001 ~]$ yarn jar ./app/hadoop-3.2.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.2.jar wordcount /input /output
2021-12-03 00:33:19,888 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
...
2021-12-03 00:33:28,218 INFO mapreduce.Job:  map 0% reduce 0%
2021-12-03 00:33:33,293 INFO mapreduce.Job:  map 100% reduce 0%
2021-12-03 00:33:37,321 INFO mapreduce.Job:  map 100% reduce 100%
2021-12-03 00:33:39,339 INFO mapreduce.Job: Job job_1638461493821_0001 completed successfully
....
[hadoop@hadoop001 ~]$
[hadoop@hadoop001 ~]$ hdfs dfs -ls /
Found 4 items
drwxr-xr-x   - hadoop supergroup          0 2021-12-03 00:30 /input
drwxr-xr-x   - hadoop supergroup          0 2021-12-03 00:33 /output
drwx------   - hadoop supergroup          0 2021-12-03 00:33 /tmp
drwxr-xr-x   - hadoop supergroup          0 2021-12-02 23:20 /user
[hadoop@hadoop001 ~]$ 
[hadoop@hadoop001 ~]$ hdfs dfs -ls /output/
Found 2 items
-rw-r--r--   1 hadoop supergroup          0 2021-12-03 00:33 /output/_SUCCESS
-rw-r--r--   1 hadoop supergroup         50 2021-12-03 00:33 /output/part-r-00000
[hadoop@hadoop001 ~]$ hdfs dfs -cat /output/*
count	1
hadppy	1
happy	1
lisi	1
word	3
zhangsan	2
[hadoop@hadoop001 ~]$ 

到此,hdfs和yarn均部署完成。

  • 2
    点赞
  • 16
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值