hadoop伪分布式部署
之前有做过hadoop 2.x的部署,现在再做一下hadoop 3.x的部署。
hadoop有三个组件:hdfs用来存储数据,mapreduce 用来计算(作业) ,yarn用来资源(cpu memory)和作业调度 。
其实hadoop官方网站都有部署的步骤:
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html
下面根据官网这个来进行伪分布式部署。
1. 可支持的平台
hadoop可支持的平台:GNU/Linux、Windows。hadoop在GNU/Linux平台上可支持到2000个节点集群。
在这里是部署在Linux机器上。
2.必要的软件
Java必须要安装,但是hadoop官网有提到有些Java版本有bug,不推荐用,所以如果你要部署时最好先看下哪些Java版本不能用,里面出现bug的原因是什么。
Java的安装需要下载Java8的安装包,然后解压到/usr/java/目录,chown修改属主,配置环境变量,生效。
ssh必须要安装,因为hadoop的一些脚本通过ssh管理远程的hadoop进程,可以which ssh看下是否有ssh,没有则需要安装一下。
3.创建一个部署hadoop的用户
[root@hadoop001 ~]# useradd hadoop
[root@hadoop001 ~]# id hadoop
uid=1000(hadoop) gid=1000(hadoop) groups=1000(hadoop)
[root@hadoop001 ~]# su - hadoop
[hadoop@hadoop001 ~]$ mkdir sourcecode software app log data lib tmp
4.配置ssh无密码访问
看官网:
可以先用ssh连本机进行验证有密码访问。
然后配置无密码访问:
#回车三次
[hadoop@hadoop001 ~]$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
Created directory '/home/hadoop/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
91:35:29:64:ad:91:28:5d:6b:fa:60:06:b2:90:d0:f5 hadoop@hadoop001
The key's randomart image is:
+--[ RSA 2048]----+
|.. ... ++oo. |
|... ..o.++o. |
|o . ..E =+ |
| . o . o.. |
| . = S |
| o o |
| . |
| |
| |
+-----------------+
[hadoop@hadoop001 .ssh]$ cd .ssh
[hadoop@hadoop001 .ssh]$ ll -a
total 12
drwx------. 2 hadoop hadoop 36 Dec 2 16:46 .
drwx------. 10 hadoop hadoop 4096 Dec 2 16:46 ..
-rw-------. 1 hadoop hadoop 1675 Dec 2 16:46 id_rsa
-rw-r--r--. 1 hadoop hadoop 397 Dec 2 16:46 id_rsa.pub
[hadoop@hadoop001 .ssh]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[hadoop@hadoop001 .ssh]$ ll
total 12
-rw-rw-r--. 1 hadoop hadoop 397 Dec 2 16:46 authorized_keys
-rw-------. 1 hadoop hadoop 1675 Dec 2 16:46 id_rsa
-rw-r--r--. 1 hadoop hadoop 397 Dec 2 16:46 id_rsa.pub
[hadoop@hadoop001 .ssh]$ chmod 0600 ~/.ssh/authorized_keys
[hadoop@hadoop001 .ssh]$ ll
total 12
-rw-------. 1 hadoop hadoop 397 Dec 2 16:46 authorized_keys
-rw-------. 1 hadoop hadoop 1675 Dec 2 16:46 id_rsa
-rw-r--r--. 1 hadoop hadoop 397 Dec 2 16:46 id_rsa.pub
[hadoop@hadoop001 .ssh]$
[hadoop@hadoop001 .ssh]$ ssh hadoop001
The authenticity of host 'hadoop001 (192.168.14.128)' can't be established.
ECDSA key fingerprint is f9:95:75:87:df:44:35:2b:8e:0f:dc:eb:87:1b:57:ec.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'hadoop001,192.168.14.128' (ECDSA) to the list of known hosts.
Last login: Thu Dec 2 15:37:15 2021
[hadoop@hadoop001 ~]$ exit
logout
Connection to hadoop001 closed.
[hadoop@hadoop001 .ssh]$
5.下载hadoop安装包并配置
下载部署:
[hadoop@hadoop001 ~]$ cd software
[hadoop@hadoop001 software]$ wget https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.2.2/hadoop-3.2.2.tar.gz
[hadoop@hadoop001 software]$ tar -xzvf hadoop-3.2.2.tar.gz -C ../app/
[hadoop@hadoop001 software]$ cd ../app
[hadoop@hadoop001 app]$ ln -s hadoop-3.2.2 hadoop
配置文件:hadoop-env.sh
然后需要配置etc/hadoop/hadoop-env.sh 文件,在里面增加JAVA_HOME环境变量
[hadoop@hadoop001 hadoop]$ pwd
/home/hadoop/app/hadoop/etc/hadoop
[hadoop@hadoop001 hadoop]$
[hadoop@hadoop001 hadoop]$ vi hadoop-env.sh
//编辑增加:
export JAVA_HOME=/usr/java/jdk1.8.0_45
//然后保存
hadoop集群有三种模式:
本地模式 ,不启动进程,很少用;
伪分布式 , 启动进程 单个进程(老大 小弟), 学习用;
集群模式 , 启动多个进程(2个老大,多个小弟), 生产用。
这里用的是伪分布式。
配置文件core-site.xml
伪分布式模式,需要etc/hadoop/core-site.xml文件。
[hadoop@hadoop001 hadoop]$ pwd
/home/hadoop/app/hadoop/etc/hadoop
[hadoop@hadoop001 hadoop]$
[hadoop@hadoop001 hadoop]$ vi core-site.xml
//编辑增加:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop001:9000</value>
</property>
</configuration>
//然后保存
其中hadoop001是机器的名称,配置这个意思是namenode是以hadoop001上进去启动的。
配置文件hdfs-site.xml
伪分布式模式,需要etc/hadoop/hdfs-site.xml文件。
[hadoop@hadoop001 hadoop]$ pwd
/home/hadoop/app/hadoop/etc/hadoop
[hadoop@hadoop001 hadoop]$
[hadoop@hadoop001 hadoop]$ vi hdfs-site.xml
//编辑增加:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
//然后保存
配置这个是配置hdfs上副本的个数为1。
配置hdfs的三个进程都以hadoop001机器名称启动
配置文件 /etc/hosts
这个文件里配置下本机器ip本机器名的映射,
配置这个是要让namenode、secondarynamenode、datanode都以hadoop001启动
如下:
[root@hadoop001 ~]# vi /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.14.128 hadoop001
配置文件 core-site.xml
上面已经配置,fs.defaultFS:hdfs://hadoop001:9000
让namenode以hadoop001启动。
配置文件workers
etc/hadoop/workers(hadoop 2.x这个文件应该是salves)
[hadoop@hadoop001 hadoop]$ pwd
/home/hadoop/app/hadoop/etc/hadoop
[hadoop@hadoop001 hadoop]$
[hadoop@hadoop001 hadoop]$ vi workers
hadoop001
//原来是localhost,需要修改成机器名
让datanode以hadoop001启动。
配置文件hdfs-site.xml
配置这个是让secondarynamenode 以hadoop001启动:
[hadoop@hadoop001 hadoop]$ pwd
/home/hadoop/app/hadoop/etc/hadoop
[hadoop@hadoop001 hadoop]$
[hadoop@hadoop001 hadoop]$ vi hdfs-site.xml
//编辑增加:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop001:9868</value>
</property>
<property>
<name>dfs.namenode.secondary.https-address</name>
<value>hadoop001:9869</value>
</property>
</configuration>
//然后保存
这个配置可以在官网的默认配置文件中可以找到:
以上配置是为了让老大、老二和小弟都以同样的机器名启动,而不是一个以hadoop001启动,一个以localhost启动。
这样有哥好处,就是以后如果机器ip变化了,只要修改hosts文件的ip就可以了。
hadoop配置文件关于/tmp目录的修改
默认情况下,hdfs的数据和三个进程pid都存在 /tmp/ 目录下,如果30天内文件不被访问,/tmp/目录下该文件会被清掉。比如说三个进程的pid文件被删除,那么如果对hdfs进行重启或者停止三个进程,则找不到对应的pid,就停止不了,再重新启动三个进程,你以为重启了,实际上并没有重启,依旧是之前的配置,如果修改了hdfs的配置,则重启不会生效。
另外数据存储目录也在/tmp很危险
所以需要修改对应的配置文件的/tmp/目录为其他目录。
现在把/tmp/目录修改成另外的目录如:/home/hadoop/tmp
看下官网的默认配置:
core-default.xml:
hdfs-default.xml:
dfs.namenode.name.dir file://${hadoop.tmp.dir}/dfs/name
dfs.datanode.data.dir file://${hadoop.tmp.dir}/dfs/data
dfs.namenode.checkpoint.dir file://${hadoop.tmp.dir}/dfs/namesecondary
看上面默认配置说明,namenode、secondarynamenode、datanode的数据目录都是在/tmp目录,所以需要配置下core-site.xml文件:
/home/hadoop/app/hadoop/etc/hadoop/core-site.xml
编辑增加如下内容:
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp/hadoop-${user.name}</value>
</property>
#注意不是/home/hadoop/tmp,后面还有hadoop-${user.name}
另外,pid文件所在目录的修改,在什么地方呢?
在配置文件hadoop-env.sh里面
编辑/home/hadoop/app/hadoop/etc/hadoop/hadoop-env.sh 文件,
找到如下两行:
# Where pid files are stored. /tmp by default.
# export HADOOP_PID_DIR=/tmp
修改成:
# Where pid files are stored. /tmp by default.
export HADOOP_PID_DIR=/home/hadoop/tmp
另外说明一下,上面是伪分布式部署,生产上,节点肯定不止一个,所以
hdfs-default.xml:
dfs.namenode.name.dir file://${hadoop.tmp.dir}/dfs/name
dfs.datanode.data.dir file://${hadoop.tmp.dir}/dfs/data
dfs.namenode.checkpoint.dir file://${hadoop.tmp.dir}/dfs/namesecondary
特别是dfs.datanode.data.dir 肯定有多个,比如说有10个刀片服务器,挂了10块物理磁盘,总共5T空间
那么就是这样的:
dfs.datanode.data.dir : /data01/dfs/dn,/data02/dfs/dn,/data03/dfs/dn…
如果说一块磁盘写的能力30M/s ,10块磁盘 就是300M/s
多块磁盘是为了存储空间更大,且高效率的读写IO。 肯定比单块磁盘更快。
所以生产上DN的数据目录参数,必然不能默认使用${hadoop.tmp.dir},需要根据自己实际情况写清楚。
6.namenode格式化
执行命令 bin/hdfs namenode -format ,进行文件系统格式化操作:
[hadoop@hadoop001 hadoop]$ pwd
/home/hadoop/app/hadoop
[hadoop@hadoop001 hadoop]$ bin/hdfs namenode -format
WARNING: /home/hadoop/app/hadoop-3.2.2/logs does not exist. Creating.
2021-12-02 22:59:25,107 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = hadoop001/192.168.14.128
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 3.2.2
.....
2021-12-02 22:59:25,905 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1643070568-192.168.14.128-1638457165899
2021-12-02 22:59:25,916 INFO common.Storage: Storage directory /home/hadoop/tmp/hadoop-hadoop/dfs/name has been successfully formatted.
2021-12-02 22:59:25,933 INFO namenode.FSImageFormatProtobuf: Saving image file /home/hadoop/tmp/hadoop-hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
2021-12-02 22:59:25,992 INFO namenode.FSImageFormatProtobuf: Image file /home/hadoop/tmp/hadoop-hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 400 bytes saved in 0 seconds .
2021-12-02 22:59:26,000 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
2021-12-02 22:59:26,006 INFO namenode.FSImage: FSImageSaver clean checkpoint: txid=0 when meet shutdown.
2021-12-02 22:59:26,006 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop001/192.168.14.128
************************************************************/
[hadoop@hadoop001 hadoop]$
然后:对NameNode daemon and DataNode daemon进行启动,执行sbin/start-dfs.sh:
[hadoop@hadoop001 hadoop]$ pwd
/home/hadoop/app/hadoop
[hadoop@hadoop001 hadoop]$ sbin/start-dfs.sh
Starting namenodes on [hadoop001]
Starting datanodes
Starting secondary namenodes [hadoop001]
[hadoop@hadoop001 hadoop]$
可以去这个地方查看相关日志:
[hadoop@hadoop001 logs]$ pwd
/home/hadoop/app/hadoop/logs
[hadoop@hadoop001 logs]$ ll
total 120
-rw-rw-r--. 1 hadoop hadoop 33392 Dec 2 23:04 hadoop-hadoop-datanode-hadoop001.log
-rw-rw-r--. 1 hadoop hadoop 691 Dec 2 23:04 hadoop-hadoop-datanode-hadoop001.out
-rw-rw-r--. 1 hadoop hadoop 38007 Dec 2 23:04 hadoop-hadoop-namenode-hadoop001.log
-rw-rw-r--. 1 hadoop hadoop 691 Dec 2 23:04 hadoop-hadoop-namenode-hadoop001.out
-rw-rw-r--. 1 hadoop hadoop 30465 Dec 2 23:04 hadoop-hadoop-secondarynamenode-hadoop001.log
-rw-rw-r--. 1 hadoop hadoop 691 Dec 2 23:04 hadoop-hadoop-secondarynamenode-hadoop001.out
-rw-rw-r--. 1 hadoop hadoop 0 Dec 2 22:59 SecurityAuth-hadoop.audit
启动如果没有问题,可以在网页上访问hdfs,网址是为:http://localhost:9870/
需要把localhost修改为相应的ip。
以前hadoop 2.X 版本,访问web界面,hdfs端口号是50070,现在3.X版本,端口号是9870
如下:
7.hdfs和mapreduce验证
验证如下:
[hadoop@hadoop001 hadoop]$ pwd
/home/hadoop/app/hadoop
[hadoop@hadoop001 hadoop]$ bin/hdfs dfs -mkdir /user
[hadoop@hadoop001 hadoop]$
[hadoop@hadoop001 hadoop]$ bin/hdfs dfs -mkdir /user/hadoop
[hadoop@hadoop001 hadoop]$ bin/hdfs dfs -mkdir input
#这里需要注意,直接写 'input' 不写hdfs具体路径,它创建的是 ‘/user/<username>}’ 目录下,
[hadoop@hadoop001 hadoop]$ bin/hdfs dfs -put etc/hadoop/*.xml input
[hadoop@hadoop001 hadoop]$ bin/hdfs dfs -ls /
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2021-12-02 23:20 /user
[hadoop@hadoop001 hadoop]$ bin/hdfs dfs -ls /user/
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2021-12-02 23:20 /user/hadoop
[hadoop@hadoop001 hadoop]$ bin/hdfs dfs -ls /user/hadoop/
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2021-12-02 23:21 /user/hadoop/input
[hadoop@hadoop001 hadoop]$ bin/hdfs dfs -ls /user/hadoop/input/
Found 9 items
-rw-r--r-- 1 hadoop supergroup 9213 2021-12-02 23:21 /user/hadoop/input/capacity-scheduler.xml
-rw-r--r-- 1 hadoop supergroup 1012 2021-12-02 23:21 /user/hadoop/input/core-site.xml
-rw-r--r-- 1 hadoop supergroup 11392 2021-12-02 23:21 /user/hadoop/input/hadoop-policy.xml
-rw-r--r-- 1 hadoop supergroup 1124 2021-12-02 23:21 /user/hadoop/input/hdfs-site.xml
-rw-r--r-- 1 hadoop supergroup 620 2021-12-02 23:21 /user/hadoop/input/httpfs-site.xml
-rw-r--r-- 1 hadoop supergroup 3518 2021-12-02 23:21 /user/hadoop/input/kms-acls.xml
-rw-r--r-- 1 hadoop supergroup 682 2021-12-02 23:21 /user/hadoop/input/kms-site.xml
-rw-r--r-- 1 hadoop supergroup 758 2021-12-02 23:21 /user/hadoop/input/mapred-site.xml
-rw-r--r-- 1 hadoop supergroup 690 2021-12-02 23:21 /user/hadoop/input/yarn-site.xml
[hadoop@hadoop001 hadoop]$
[hadoop@hadoop001 hadoop]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.2.jar grep input output 'dfs[a-z.]+'
......
[hadoop@hadoop001 hadoop]$
[hadoop@hadoop001 hadoop]$ bin/hdfs dfs -ls /user/hadoop/
Found 2 items
drwxr-xr-x - hadoop supergroup 0 2021-12-02 23:21 /user/hadoop/input
drwxr-xr-x - hadoop supergroup 0 2021-12-02 23:24 /user/hadoop/output
[hadoop@hadoop001 hadoop]$ bin/hdfs dfs -ls /user/hadoop/output/
Found 2 items
-rw-r--r-- 1 hadoop supergroup 0 2021-12-02 23:24 /user/hadoop/output/_SUCCESS
-rw-r--r-- 1 hadoop supergroup 90 2021-12-02 23:24 /user/hadoop/output/part-r-00000
[hadoop@hadoop001 hadoop]$ bin/hdfs dfs -cat /user/hadoop/output/*
1 dfsadmin
1 dfs.replication
1 dfs.namenode.secondary.https
1 dfs.namenode.secondary.http
[hadoop@hadoop001 hadoop]$
查看hdfs三个进程是否启动以及停止进程,命令如下:
[hadoop@hadoop001 hadoop]$ jps
6017 Jps
4860 SecondaryNameNode
4525 NameNode
4671 DataNode
[hadoop@hadoop001 hadoop]$
[hadoop@hadoop001 hadoop]$ ps -ef|grep hadoop
hadoop 4525 1 0 23:04 ? 00:00:10 /usr/java/jdk1.8.0_45/bin/java -Dproc_namenode -Djava.net.preferIPv4Stack=true -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dyarn.log.dir=/home/hadoop/app/hadoop-3.2.2/logs -Dyarn.log.file=hadoop-hadoop-namenode-hadoop001.log -Dyarn.home.dir=/home/hadoop/app/hadoop-3.2.2 -Dyarn.root.logger=INFO,console -Djava.library.path=/home/hadoop/app/hadoop-3.2.2/lib/native -Dhadoop.log.dir=/home/hadoop/app/hadoop-3.2.2/logs -Dhadoop.log.file=hadoop-hadoop-namenode-hadoop001.log -Dhadoop.home.dir=/home/hadoop/app/hadoop-3.2.2 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,RFA -Dhadoop.policy.file=hadoop-policy.xml org.apache.hadoop.hdfs.server.namenode.NameNode
hadoop 4671 1 0 23:04 ? 00:00:09 /usr/java/jdk1.8.0_45/bin/java -Dproc_datanode -Djava.net.preferIPv4Stack=true -Dhadoop.security.logger=ERROR,RFAS -Dyarn.log.dir=/home/hadoop/app/hadoop-3.2.2/logs -Dyarn.log.file=hadoop-hadoop-datanode-hadoop001.log -Dyarn.home.dir=/home/hadoop/app/hadoo-3.2.2 -Dyarn.root.logger=INFO,console -Djava.library.path=/home/hadoop/app/hadoop-3.2.2/lib/native -Dhadoop.log.dir=/home/hadoop/app/hadoop-3.2.2/logs -hadoop.log.file=hadoop-hadoop-datanode-hadoop001.log -Dhadoop.home.dir=/home/hadoop/app/hadoop-3.2.2 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,RFA -Dhadoop.policy.file=hadoop-policy.xml org.apache.hadoop.hdfs.server.datanode.DataNode
hadoop 4860 1 0 23:04 ? 00:00:05 /usr/java/jdk1.8.0_45/bin/java -Dproc_secondarynamenode -Djava.net.preferIPv4Stack=true -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dyarn.log.dir=/home/hadoop/app/hadoop-3.2.2/logs -Dyarn.log.file=hadoop-hadoop-secondarynamenode-hadoop001.log -Dyarn.home.dir=/home/hadoop/app/hadoop-3.2.2 -Dyarn.root.logger=INFO,console -Djava.library.path=/home/hadoop/app/hadoop-3.2.2/lib/native -Dhadoop.log.dir=/home/hadoop/app/hadoop-3.2.2/logs -Dhadoop.log.file=hadoop-hadoop-secondarynamenode-hadoop001.log -Dhadoop.home.dir=/home/hadoop/app/hadoop-3.2.2 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,RFA -Dhadoop.policy.file=hadoop-policy.xml org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode
hadoop 6033 4057 0 23:29 pts/2 00:00:00 grep --color=auto hadoop
[hadoop@hadoop001 hadoop]$
[hadoop@hadoop001 hadoop]$ sbin/stop-dfs.sh
Stopping namenodes on [hadoop001]
Stopping datanodes
Stopping secondary namenodes [hadoop001]
[hadoop@hadoop001 hadoop]$
[hadoop@hadoop001 hadoop]$ jps
6523 Jps
[hadoop@hadoop001 hadoop]$
以上算是把hdfs部署完成。
8.YARN的部署
You can run a MapReduce job on YARN in a pseudo-distributed mode by setting a few parameters and running ResourceManager daemon and NodeManager daemon in addition.
在伪分布式模式上,你可以通过几个参数设置并且运行ResourceManager和NodeManager 进程 ,把mapreduce的任务运行在yarn上面。
下面进行设置参数。
编辑文件:etc/hadoop/mapred-site.xml,添加如下内容:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
</property>
</configuration>
编辑文件:etc/hadoop/yarn-site.xml,添加如下内容:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME</value>
</property>
</configuration>
然后通过sbin/start-yarn.sh启动yarn的两个进程:
如下:
[hadoop@hadoop001 hadoop]$ vi etc/hadoop/mapred-site.xml
[hadoop@hadoop001 hadoop]$ vi etc/hadoop/yarn-site.xml
[hadoop@hadoop001 hadoop]$ sbin/start-dfs.sh
Starting namenodes on [hadoop001]
Starting datanodes
Starting secondary namenodes [hadoop001]
[hadoop@hadoop001 hadoop]$ jps
6713 NameNode
6861 DataNode
7181 Jps
7054 SecondaryNameNode
[hadoop@hadoop001 hadoop]$
[hadoop@hadoop001 hadoop]$ sbin/start-yarn.sh
Starting resourcemanager
Starting nodemanagers
[hadoop@hadoop001 hadoop]$ jps
7457 NodeManager
7777 Jps
7316 ResourceManager
6713 NameNode
6861 DataNode
7054 SecondaryNameNode
[hadoop@hadoop001 hadoop]$
然后可以通过web界面访问yarn的ResourceManager端:
地址是: http://localhost:8088/
localhost这个需要修改成对应的ip地址。
我的是虚拟机,所以直接可以看了,如果是云主机,则需要去你买的云主机网站上开放对应的端口号。
云主机的话,8088端口容易被挖矿,比如:https://segmentfault.com/a/1190000015264170
可以修改yarn的web界面端口号,修改yarn-site.xml文件,在文件中添加如下内容(有的话进行修改):
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hadoop001:8123</value>
</property>
这些配置,在官网上面的默认配置文件中都可以找到相关的参数。
然后重启一下yarn进程。
环境变量配置
配置下hadoop的环境变量:
#在~/ .bashrc中增加:
#export HADOOP_HOME=/home/hadoop/app/hadoop
#export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
[hadoop@hadoop001 ~]$ vi .bashrc
[hadoop@hadoop001 ~]$
[hadoop@hadoop001 ~]$ source .bashrc
[hadoop@hadoop001 ~]$ which hdfs
~/app/hadoop/bin/hdfs
[hadoop@hadoop001 ~]$ which yarn
~/app/hadoop/bin/yarn
[hadoop@hadoop001 ~]$ which hadoop
~/app/hadoop/bin/hadoop
[hadoop@hadoop001 ~]$
yarn的验证
Run a MapReduce job。
可以用上面验证hdfs的方法再验证,这里用wordcount案例来验证一下。
[hadoop@hadoop001 ~]$ vi wordcount.txt
word count
zhangsan lisi
word
word zhangsan happy hadppy
[hadoop@hadoop001 ~]$
[hadoop@hadoop001 ~]$ pwd
/home/hadoop
[hadoop@hadoop001 ~]$ hdfs dfs -mkdir /input
[hadoop@hadoop001 ~]$ hdfs dfs -ls /
Found 2 items
drwxr-xr-x - hadoop supergroup 0 2021-12-03 00:29 /input
drwxr-xr-x - hadoop supergroup 0 2021-12-02 23:20 /user
[hadoop@hadoop001 ~]$ hdfs dfs -put wordcount.txt /input
[hadoop@hadoop001 ~]$ hdfs dfs -ls /input/
Found 1 items
-rw-r--r-- 1 hadoop supergroup 58 2021-12-03 00:30 /input/wordcount.txt
[hadoop@hadoop001 ~]$
[hadoop@hadoop001 ~]$ hdfs dfs -cat /input/wordcount.txt
word count
zhangsan lisi
word
word zhangsan happy hadppy
[hadoop@hadoop001 ~]$ find ./ -name *example*.jar
./app/hadoop-3.2.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.2.jar
./app/hadoop-3.2.2/share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-3.2.2-test-sources.jar
./app/hadoop-3.2.2/share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-3.2.2-sources.jar
[hadoop@hadoop001 ~]$
[hadoop@hadoop001 ~]$
[hadoop@hadoop001 ~]$ yarn jar ./app/hadoop-3.2.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.2.jar wordcount /input /output
2021-12-03 00:33:19,888 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
...
2021-12-03 00:33:28,218 INFO mapreduce.Job: map 0% reduce 0%
2021-12-03 00:33:33,293 INFO mapreduce.Job: map 100% reduce 0%
2021-12-03 00:33:37,321 INFO mapreduce.Job: map 100% reduce 100%
2021-12-03 00:33:39,339 INFO mapreduce.Job: Job job_1638461493821_0001 completed successfully
....
[hadoop@hadoop001 ~]$
[hadoop@hadoop001 ~]$ hdfs dfs -ls /
Found 4 items
drwxr-xr-x - hadoop supergroup 0 2021-12-03 00:30 /input
drwxr-xr-x - hadoop supergroup 0 2021-12-03 00:33 /output
drwx------ - hadoop supergroup 0 2021-12-03 00:33 /tmp
drwxr-xr-x - hadoop supergroup 0 2021-12-02 23:20 /user
[hadoop@hadoop001 ~]$
[hadoop@hadoop001 ~]$ hdfs dfs -ls /output/
Found 2 items
-rw-r--r-- 1 hadoop supergroup 0 2021-12-03 00:33 /output/_SUCCESS
-rw-r--r-- 1 hadoop supergroup 50 2021-12-03 00:33 /output/part-r-00000
[hadoop@hadoop001 ~]$ hdfs dfs -cat /output/*
count 1
hadppy 1
happy 1
lisi 1
word 3
zhangsan 2
[hadoop@hadoop001 ~]$
到此,hdfs和yarn均部署完成。