hadoop完全分布式环境搭建

最新推荐文章于 2024-01-16 16:20:50 发布

weixin_54897697

最新推荐文章于 2024-01-16 16:20:50 发布

阅读量243

点赞数

分类专栏： Hadoop 文章标签： hadoop

本文链接：https://blog.csdn.net/weixin_54897697/article/details/115472044

版权

Hadoop 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

本虚拟机使用的是VM15.5版本 CentSO 8 系统 jdk-8u161-linux-x64.tar.gz hadoop3.2.1版本
1）准备 3台客户机（关闭防火墙、静态 IP、主机名称）
2) 配置 ssh
3）安装 JDK
4）安装 Hadoop
5）配置环境变量
6）配置集群
7）单点启动
8）群起并测试集群

1.安装VM
2.配置网络环境
在这里插入图片描述可自己更改子网IP

需手动设置IP地址
VM网络环境搭建完成
配置要求:3台虚拟机 192.168.148.106/107/108
2.搭建一台虚拟机
3.配置虚拟机环境
更改ip地址

[root@hadoop105 ~]# vim /etc/sysconfig/network-scripts/ifcfg-ens33  #配置网络
TYPE="Ethernet"
PROXY_METHOD="none"
BROWSER_ONLY="no"
BOOTPROTO="static"   #改为固定IP
DEFROUTE="yes"
IPV4_FAILURE_FATAL="no"
IPV6INIT="yes"
IPV6_AUTOCONF="yes"
IPV6_DEFROUTE="yes"
IPV6_FAILURE_FATAL="no"
IPV6_ADDR_GEN_MODE="stable-privacy"
NAME="ens33"
UUID="6e807274-47d5-459d-b513-b6f2ce9b0df3"
DEVICE="ens33"
ONBOOT="yes"

IPADDR=192.168.148.105   #添加IP地址及网关
GATEWAY=192.168.148.2
DNS1=192.168.148.2

更改主机名称

[root@hadoop105 ~]# vim /etc/hostname
hadoop105    #改为hadoop105

添加映射关系

[root@hadoop105 ~]# vim /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.148.105  hadoop105    #添加映射
192.168.148.106  hadoop106
192.168.148.107  hadoop107
192.168.148.108  hadoop108

重启虚拟机 reboot

Xshell 7开启远程登陆(使用IP地址登陆)----配置主机映射(使用用户名登陆)C:\Windows\System32\drivers\etc\hosts文件(若文件无法打开,需拖至桌面在打开,更改后替换源文件) 添加映射关系(即上面映射关系)

在虚拟机内安装软件包 yum install -y epel-release
关闭防火墙

[root@hadoop105 ~]# systemctl stop firewalld  #关闭防火墙
[root@hadoop105 ~]# systemctl disable firewalld.service  #开机关闭防火墙
Removed /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.

添加普通用户(前期已添加) #可省略

[root@hadoop100 ~]# useradd atguigu
[root@hadoop100 ~]# passwd atguigu

配置普通用户linux具备root权限,方便后期sudo执行

[root@hadoop105 ~]# vim  /etc/sudoers
## Allows people in group wheel to run all commands
%wheel  ALL=(ALL)       ALL
linux   ALL=(ALL)       NOPASSWD:ALL   #添加内容

创建文件夹并更改文件所属主,所属组

[root@hadoop105 opt]# mkdir module   创建文件
[root@hadoop105 opt]# mkdir software
[root@hadoop105 opt]# ll
总用量 0
drwxr-xr-x. 2 root root 6 4月   6 22:54 module
drwxr-xr-x. 2 root root 6 4月   6 22:54 software
[root@hadoop105 opt]# cd 
[root@hadoop105 ~]# chown linux:linux /opt/module   更改所属主
[root@hadoop105 ~]# chown linux:linux /opt/software
[root@hadoop105 ~]# cd /opt/
[root@hadoop105 opt]# ll
总用量 0
drwxr-xr-x. 2 linux linux 6 4月   6 22:54 module
drwxr-xr-x. 2 linux linux 6 4月   6 22:54 software

重启虚拟机 reboot
克隆虚拟机106,107,108
关闭虚拟机105 完整克隆106,107,108
依次打开虚拟机106,107,108并更改IP地址及主机名

[root@hadoop105 ~]# vim /etc/sysconfig/network-scripts/ifcfg-ens33 
[root@hadoop105 ~]# vim /etc/hostname
[root@hadoop105 ~]# vim /etc/hosts

设置免密登陆,分别在106,107,108设置一遍,然后再root用户106下再设置一遍

[linux@hadoop106 ~]$ ll -al
总用量 28
drwx------. 2 linux linux   25 4月   7 22:06 .ssh
-rw-------. 1 linux linux 1654 4月   8 15:02 .viminfo
-rw-------. 1 linux linux  220 4月   8 14:30 .Xauthority
[linux@hadoop106 ~]$ cd .ssh
[linux@hadoop106 .ssh]$ ll
总用量 4
-rw-r--r--. 1 linux linux 561 4月   8 15:06 known_hosts
[linux@hadoop106 .ssh]$ ssh-keygen -t rsa    #读取密匙  三次回车
Generating public/private rsa key pair.
Enter file in which to save the key (/home/linux/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/linux/.ssh/id_rsa.
Your public key has been saved in /home/linux/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:82i/VEGhT0YdrcNSwiZ7jcABzcY+gLLeJ23Cl5bChm8 linux@hadoop106
The key's randomart image is:
+---[RSA 2048]----+
|       ..*.o+o.o |
|    . . . O++ o .|
|     o   +.=+* . |
|    .     +++.=  |
|   . = .So oo. . |
|    o O O+ .     |
|     o Oo o      |
|      E. o       |
|     .    o.     |
+----[SHA256]-----+
[linux@hadoop106 .ssh]$ ll
总用量 12
-rw-------. 1 linux linux 1823 4月   8 15:28 id_rsa
-rw-r--r--. 1 linux linux  397 4月   8 15:28 id_rsa.pub
-rw-r--r--. 1 linux linux  561 4月   8 15:06 known_hosts
[linux@hadoop106 .ssh]$ ssh-copy-id hadoop107   #发送公匙给107
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/linux/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
linux@hadoop107's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'hadoop107'"
and check to make sure that only the key(s) you wanted were added.

[linux@hadoop106 .ssh]$ ssh hadoop107
Activate the web console with: systemctl enable --now cockpit.socket

Last failed login: Thu Apr  8 15:16:16 CST 2021 from 192.168.148.106 on ssh:notty
There was 1 failed login attempt since the last successful login.
Last login: Thu Apr  8 14:30:13 2021 from 192.168.148.1
[linux@hadoop107 ~]$ exit
注销
Connection to hadoop107 closed.
[linux@hadoop106 .ssh]$ ssh-copy-id hadoop108    #发送公匙给108
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/linux/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
linux@hadoop108's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'hadoop108'"
and check to make sure that only the key(s) you wanted were added.

[linux@hadoop106 .ssh]$ ssh-copy-id hadoop106    #发送公匙给106
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/linux/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
linux@hadoop106's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'hadoop106'"
and check to make sure that only the key(s) you wanted were added.

[linux@hadoop106 .ssh]$ ll
总用量 16
-rw-------. 1 linux linux  397 4月   8 15:31 authorized_keys
-rw-------. 1 linux linux 1823 4月   8 15:28 id_rsa
-rw-r--r--. 1 linux linux  397 4月   8 15:28 id_rsa.pub
-rw-r--r--. 1 linux linux  561 4月   8 15:06 known_hosts

上传安装包方式1:
Xshell 7开启3台虚拟机的远程登陆，使用Xftp 7在106上传jdk和hadoop安装包
上传安装包方式2:
Xshell 7开启3台虚拟机的远程登陆，在106虚拟机 yum -y install lrzsz 安装rz服务,在文件夹下使用***rz命令***上传jdk和hadoop安装包
使用ll命令在文件夹下查看上传安装包
解压安装包至另一文件夹下

[linux@hadoop106 software]$ ll
总用量 536092
-rw-rw-r--. 1 linux linux 359196911 4月   7 19:51 hadoop-3.2.1.tar.gz
-rw-rw-r--. 1 linux linux 189756259 4月   7 19:49 jdk-8u161-linux-x64.tar.gz
[linux@hadoop106 software]$ tar -zxvf jdk-8u161-linux-x64.tar.gz -C /opt/module/    #解压jdk安装包
[linux@hadoop106 software]$ tar -zxvf hadoop-3.2.1.tar.gz -C /opt/module/
   #解压hadoop安装包

复制或拷贝三种方式:
发送: 把106的jdk,hadoop发送给107/108

[linux@hadoop106 module]$ scp -r jdk1.8.0_161/ linux@hadoop107:/opt/module/

拉取: 把106的jdk,hadoop拉取到107/108

[linux@hadoop107 module]$ scp -r linux@hadoop106:/opt/module/hadoop-3.2.1 ./
[linux@hadoop107 module]$ ll
总用量 0
drwxr-xr-x. 11 linux linux 180 4月   7 22:45 hadoop-3.2.1
drwxr-xr-x.  8 linux linux 255 4月   7 22:09 jdk1.8.0_161

发送: 在107上,把106的jdk,hadoop发送到108

[linux@hadoop107 ~]$ scp -r linux@hadoop106:/opt/module/* linux@hadoop108:/opt/module/

配置环境变量

[linux@hadoop106 jdk1.8.0_161]$ cd /etc/profile.d/
[linux@hadoop106 profile.d]$ sudo vim my_env.sh  #创建my_env.sh文件并写入变量
[linux@hadoop106 profile.d]$ source /etc/profile   #刷新环境变量
[linux@hadoop106 profile.d]$ java  #查看Java环境变量

[linux@hadoop106 module]$ cd hadoop-3.2.1/
[linux@hadoop106 hadoop-3.2.1]$ sudo vim /etc/profile.d/my_env.sh 

#JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.8.0_161
export PATH=$PATH:$JAVA_HOME/bin

#HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop-3.2.1
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin

设置集群分发脚本

[linux@hadoop106 ~]$ mkdir bin
[linux@hadoop106 ~]$ ll -al
总用量 28
drwxrwxr-x. 2 linux linux    6 4月   8 14:48 bin
drwx------. 3 linux linux   19 4月   6 22:28 .config
-rw-------. 1 linux linux   16 4月   6 22:28 .esd_auth
drwx------. 2 linux linux   25 4月   7 22:06 .ssh   #此处省略部分
[linux@hadoop106 ~]$ cd bin/
[linux@hadoop106 bin]$ vim xsync     #以下为分发脚本内容
#!/bin/bash
#1.判断参数个数
if [ $# -lt 1 ]
then
        echo Not Enough Arguement!
        exit;
fi
#2.遍历集群所有机器
for host in hadoop106 hadoop107 hadoop108
do
        echo ====================  $host  ====================
#3.遍历所有目录，挨个发送
        for file in $@
        do
#4.判断文件是否存在
                if [ -e $file ]
                        then
#5.获取父目录
                                pdir=$(cd -P $(dirname $file); pwd)
#6.获取当前文件的名称
                                fname=$(basename $file)
                                ssh $host "mkdir -p $pdir"
                                rsync -av $pdir/$fname $host:$pdir
                        else
                                echo $file does not exists!
                fi
        done
done
[linux@hadoop106 bin]$ ll
总用量 4
-rw-rw-r--. 1 linux linux 579 4月   8 15:02 xsync
[linux@hadoop106 bin]$ chmod 777 xsync   #更改分发脚本权限
[linux@hadoop106 bin]$ ll
总用量 4
-rwxrwxrwx. 1 linux linux 579 4月   8 15:02 xsync

分发环境变量脚本并在107,108运行

[linux@hadoop106 ~]$ sudo ./bin/xsync /etc/profile.d/my_env.sh 
[linux@hadoop107 ~]$ source /etc/profile
[linux@hadoop107 ~]$ java

本地模式测试

[linux@hadoop106 hadoop-3.2.1]$ mkdir wcinput  #创建wcinput文件
[linux@hadoop106 hadoop-3.2.1]$ ll
总用量 180
drwxr-xr-x. 2 linux linux    203 9月  11 2019 bin
drwxr-xr-x. 3 linux linux     20 9月  10 2019 etc
drwxr-xr-x. 2 linux linux    106 9月  11 2019 include
drwxr-xr-x. 3 linux linux     20 9月  11 2019 lib
drwxr-xr-x. 4 linux linux    288 9月  11 2019 libexec
-rw-rw-r--. 1 linux linux 150569 9月  10 2019 LICENSE.txt
-rw-rw-r--. 1 linux linux  22125 9月  10 2019 NOTICE.txt
-rw-rw-r--. 1 linux linux   1361 9月  10 2019 README.txt
drwxr-xr-x. 3 linux linux   4096 9月  10 2019 sbin
drwxr-xr-x. 4 linux linux     31 9月  11 2019 share
drwxrwxr-x. 2 linux linux      6 4月   7 20:41 wcinput
[linux@hadoop106 hadoop-3.2.1]$ cd wcinput/
[linux@hadoop106 wcinput]$ vim word.txt  #创建并编辑Word.txt文本
[linux@hadoop106 wcinput]$ cd ..
[linux@hadoop106 hadoop-3.2.1]$ pwd
/opt/module/hadoop-3.2.1
[linux@hadoop106 hadoop-3.2.1]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar wordcount wcinput/ ./wcoutput   #运行jar包里的wordcount文本 在wcinput输入,wcoutput输出

查看运行数据

[linux@hadoop106 hadoop-3.2.1]$ ll
总用量 180
drwxr-xr-x. 2 linux linux    203 9月  11 2019 bin
drwxr-xr-x. 3 linux linux     20 9月  10 2019 etc
drwxr-xr-x. 2 linux linux    106 9月  11 2019 include
drwxr-xr-x. 3 linux linux     20 9月  11 2019 lib
drwxr-xr-x. 4 linux linux    288 9月  11 2019 libexec
-rw-rw-r--. 1 linux linux 150569 9月  10 2019 LICENSE.txt
-rw-rw-r--. 1 linux linux  22125 9月  10 2019 NOTICE.txt
-rw-rw-r--. 1 linux linux   1361 9月  10 2019 README.txt
drwxr-xr-x. 3 linux linux   4096 9月  10 2019 sbin
drwxr-xr-x. 4 linux linux     31 9月  11 2019 share
drwxrwxr-x. 2 linux linux     22 4月   7 20:43 wcinput
drwxr-xr-x. 2 linux linux     88 4月   7 20:49 wcoutput
[linux@hadoop106 hadoop-3.2.1]$ cd wcoutput/
[linux@hadoop106 wcoutput]$ ll
总用量 4
-rw-r--r--. 1 linux linux 63 4月   7 20:49 part-r-00000   #数据
-rw-r--r--. 1 linux linux  0 4月   7 20:49 _SUCCESS  #标记
[linux@hadoop106 wcoutput]$ cat part-r-00000   #查看数据
lisi	1
qwer	1
ss	1
wangba	1
wangwu	1
wangxiaoqing	1
zhangsan	1

重复执行后报错org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/opt/module/hadoop-3.2.1/wcoutput already exists 此语句说明文件已存在输出路径wcoutput中

[linux@hadoop106 hadoop-3.2.1]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar wordcount wcinput/ ./wcoutput
2021-04-07 20:54:54,758 INFO impl.MetricsConfig: Loaded properties from hadoop-metrics2.properties
2021-04-07 20:54:55,060 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2021-04-07 20:54:55,061 INFO impl.MetricsSystemImpl: JobTracker metrics system started
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/opt/module/hadoop-3.2.1/wcoutput already exists
	at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:164)
	at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:277)
	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:143)
	at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570)
	at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567)
	at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1588)
	at org.apache.hadoop.examples.WordCount.main(WordCount.java:87)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
	at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
	at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:236)

完全分布式搭建

集群配置 3.X独有的最后四个配置
core-site.xml、hdfs-site.xml、yarn-site.xml、mapred-site.xml、workers、hadoop-env.sh、yarn-env.sh、mapred-env.sh、start-dfs.sh, stop-dfs.sh、start-yarn.sh和stop-yarn.sh

[linux@hadoop106 hadoop-3.2.1]$ cd etc/hadoop/
[linux@hadoop106 hadoop]$ ll
总用量 172
-rw-r--r--. 1 linux linux   774 9月  10 2019 core-site.xml
-rw-r--r--. 1 linux linux   775 9月  11 2019 hdfs-site.xml
-rw-r--r--. 1 linux linux   951 9月  11 2019 mapred-env.cmd
-rw-r--r--. 1 linux linux  1764 9月  11 2019 mapred-env.sh
-rw-r--r--. 1 linux linux  4113 9月  11 2019 mapred-queues.xml.template
-rw-r--r--. 1 linux linux   758 9月  11 2019 mapred-site.xml
-rw-r--r--. 1 linux linux    10 9月  10 2019 workers
-rw-r--r--. 1 linux linux  2250 9月  11 2019 yarn-env.cmd
-rw-r--r--. 1 linux linux  6056 9月  11 2019 yarn-env.sh
-rw-r--r--. 1 linux linux  2591 9月  11 2019 yarnservice-log4j.properties
-rw-r--r--. 1 linux linux   690 9月  11 2019 yarn-site.xml
[linux@hadoop106 hadoop]$ vim core-site.xml 
[linux@hadoop106 hadoop]$ vim hdfs-site.xml 
[linux@hadoop106 hadoop]$ vim yarn-site.xml 
[linux@hadoop106 hadoop]$ vim mapred-site.xml 
[linux@hadoop106 hadoop]$ vim workers
[linux@hadoop106 hadoop]$ vim hadoop-env.sh   #必须加，不然进程停止易出问题，只能被杀死
[linux@hadoop106 hadoop]$ vim yarn-env.sh     #必须加，不然进程停止易出问题，只能被杀死
[linux@hadoop106 hadoop]$ vim mapred-env.sh  #没确定他的用处，暂时可以不加
[linux@hadoop106 sbin]$ vim start-dfs.sh    #3.x版本特有的
[linux@hadoop106 sbin]$ vim stop-dfs.sh 
[linux@hadoop106 sbin]$ vim start-yarn.sh 
[linux@hadoop106 sbin]$ vim stop-yarn.sh 
[linux@hadoop106 hadoop]$ xsync /opt/module/hadoop-3.2.1/etc/
[linux@hadoop106 hadoop-3.2.1]$ xsync /opt/module/hadoop-3.2.1/sbin/  #分发配置文件

core-site.xml

<configuration>
<!--指定NameNode的地址   -->
<property>
	<name>fs.defaultFS</name>
	<value>hdfs://hadoop106:8020</value>
</property>
<!--指定hadoop数据的存储目录   -->
<property>
	<name>hadoop.tmp.dir</name>
	<value>/opt/module/hadoop-3.2.1/data</value>
</property>
<!--配置HDFS网页登录使用的静态用户为linux   -->
<property>
	<name>hadoop.http.staticuser.user</name>
	<value>linux</value>
</property>
</configuration>

hdfs-site.xml

<configuration>
<!-- nn web端访问地址-->
<property>
	<name>dfs.namenode.http-address</name>
	<value>hadoop106:9870</value>
</property>
<!-- 2nn web端访问地址-->
<property>
	<name>dfs.namenode.secondary.http-address</name>
	<value>hadoop108:9868</value>
</property>
</configuration>

yarn-site.xml

<configuration>
<!--指定MR走shuffle   -->
<property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
</property>
<!--指定ResourceManager的地址-->
<property>
        <name>yarn.resourcemanager.hostname</name>
        <value>hadoop107</value>
</property>
<!--环境变量的继承   -->
<property>
        <name>yarn.nodemanager.env-whitelist</name>        									
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
<!--开启日志聚集功能   -->
<property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
</property>
<!--设置日志聚集服务器地址   -->
<property>
        <name>yarn.log.server.url</name>        
        <value>http://hadoop106:19888/jobhistory/logs</value>
</property>
<!--设置日志保留时间为7天   -->
        <property>        
        <name>yarn.log-aggregation.retain-seconds</name>        
        <value>604800</value>
</property>

</configuration>

mapred-site.xml

<!--指定MapReduce程序运行在Yarn上    -->
<property>
	<name>mapreduce.framework.name</name>
	<value>yarn</value>
</property>
<property>
	<name>yarn.app.mapreduce.am.env</name>
	<value>HADOOP_MAPRED_HOME=/opt/module/hadoop-3.2.1</value>
</property>
<property>
        <name>mapreduce.map.env</name>
	<value>HADOOP_MAPRED_HOME=/opt/module/hadoop-3.2.1</value>
</property>
<property>
	<name>mapreduce.reduce.env</name>
	<value>HADOOP_MAPRED_HOME=/opt/module/hadoop-3.2.1</value>		
</property>

<!--历史服务器端地址   -->
<property>
	<name>mapreduce.jobhistory.address</name>
	<value>hadoop106:10020</value>
</property>
<!--历史服务器web端地址   -->
<property>
	<name>mapreduce.jobhistory.webapp.address</name>
	<value>hadoop106:19888</value>
</property>

workers

hadoop102
hadoop103
hadoop104

hadoop-env.sh

export JAVA_HOME=/opt/module/jdk1.8.0_161  #在文件最底层添加JAVA路径

yarn-env.sh

export JAVA_HOME=/opt/module/jdk1.8.0_161  #在文件最底层添加JAVA路径

mapred-env.sh

export JAVA_HOME=/opt/module/jdk1.8.0_161  #在文件最底层添加JAVA路径

start-dfs.sh, stop-dfs.sh

HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root

start-yarn.sh和stop-yarn.sh

YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root

格式化并启动集群

[linux@hadoop106 hadoop-3.2.1]$ hdfs namenode -format   #格式化
[linux@hadoop106 hadoop-3.2.1]$ sbin/start-dfs.sh   #启动dfs
[linux@hadoop107 hadoop-3.2.1]$ sbin/start-yarn.sh   #启动yarn
[linux@hadoop106 hadoop-3.2.1]$ mapred --daemon start historyserver   #启动历史服务器

集群的启动/停止
HDFS: start-dfs.sh/stop-dfs.sh
YARN: start-yarn.sh/stop-yarn.sh
单个组件的启动/停止
hdfs --daemon start/stop namenode/datanode/secondarynamenode
yarn --daemon start/stop resourcemanager/nodemanager```

设置集群启动/停止脚本

[linux@hadoop106 ~]$ cd bin/
[linux@hadoop106 bin]$ ll
总用量 4
-rwxrwxrwx. 1 linux linux 579 4月   8 15:02 xsync
[linux@hadoop106 bin]$ vim myhadoop.sh   #创建脚本
[linux@hadoop106 bin]$ chmod 777 myhadoop.sh    #更改权限
[linux@hadoop106 bin]$ ll
总用量 8
-rwxrwxrwx. 1 linux linux 1053 4月   8 23:06 myhadoop.sh
-rwxrwxrwx. 1 linux linux  579 4月   8 15:02 xsync
[linux@hadoop106 bin]$ myhadoop.sh stop   #关闭集群
[linux@hadoop106 bin]$ myhadoop.sh start   #开启集群

集群启动/停止脚本内容

#!/bin/bash
if [ $# -lt 1 ]
then
	echo "No Args Input..."
	exit ;
fi
case $1 in
"start")
	echo " ===================启动  hadoop集群  ==================="
	echo " ---------------启动  hdfs ---------------"
	ssh hadoop102 "/opt/module/hadoop-3.2.1/sbin/start-dfs.sh"
	echo " ---------------启动  yarn ---------------"
	ssh hadoop103 "/opt/module/hadoop-3.2.1/sbin/start-yarn.sh"
	echo " ---------------启动  historyserver ---------------"
	ssh hadoop102  "/opt/module/hadoop-3.2.1/bin/mapred --daemon start historyserver"
;;
"stop")
	echo " ===================关闭  hadoop集群  ==================="
	echo " ---------------关闭  historyserver ---------------"
	ssh hadoop102  "/opt/module/hadoop-3.2.1/bin/mapred  --daemon stop historyserver"
	echo " ---------------关闭  yarn ---------------"
	ssh hadoop103 "/opt/module/hadoop-3.2.1/sbin/stop-yarn.sh"
	echo " ---------------关闭  hdfs ---------------"
	ssh hadoop102 "/opt/module/hadoop-3.2.1/sbin/stop-dfs.sh"
;;
*)
	echo "Input Args Error..."
;;
esac

设置查看进程脚本

[linux@hadoop106 bin]$ vim jpsall   #创建文本
[linux@hadoop106 bin]$ chmod 777 jpsall   #更改权限
[linux@hadoop106 bin]$ ll
总用量 12
-rwxrwxrwx. 1 linux linux  122 4月   9 00:05 jpsall
-rwxrwxrwx. 1 linux linux 1025 4月   8 23:23 myhadoop.sh
-rwxrwxrwx. 1 linux linux  579 4月   8 15:02 xsync
[linux@hadoop106 bin]$ jpsall   #查看进程

jps脚本内容

#!/bin/bash
for host in hadoop106 hadoop107 hadoop108
do
	echo =============== $host ===============
	ssh $host jps
done

分发自定义脚本,保证自定义脚本在三台机器上都可以使用

[linux@hadoop106 bin]$ sxync /home/linux/bin/

使用shell上传文件或安装包至web端

[linux@hadoop106 hadoop-3.2.1]$ hadoop fs -mkdir /input  #创建文件
[linux@hadoop106 hadoop-3.2.1]$ hadoop fs -put wcinput/word.txt /input
  #上传文件word  在wcinpit文件中输入 ,input文件下输出

在shell端查看上传文件在HDFS中的存储路径

[linux@hadoop106 subdir0]$ pwd
/opt/module/hadoop-3.2.1/data/dfs/data/current/BP-2096428802-192.168.148.106-1617875461452/current/finalized/subdir0/subdir0
[linux@hadoop106 subdir0]$ ll
总用量 186772
-rw-rw-r--. 1 linux linux        49 4月   8 18:15 blk_1073741826
-rw-rw-r--. 1 linux linux        11 4月   8 18:15 blk_1073741826_1002.meta
-rw-rw-r--. 1 linux linux 134217728 4月   8 18:25 blk_1073741827
-rw-rw-r--. 1 linux linux   1048583 4月   8 18:25 blk_1073741827_1004.meta
-rw-rw-r--. 1 linux linux  55538531 4月   8 18:27 blk_1073741828
-rw-rw-r--. 1 linux linux    433903 4月   8 18:27 blk_1073741828_1005.meta
[linux@hadoop106 subdir0]$ cat blk_1073741826  #查看内容
ss
qwer
wangxiaoqing
zhangsan
lisi
wangwu
wangba

内容拼接到一个包

[linux@hadoop106 subdir0]$ cat blk_1073741836>>tmp.tar.gz
[linux@hadoop106 subdir0]$ cat blk_1073741837>>tmp.tar.gz
[linux@hadoop106 subdir0]$ tar -zxvf tmp.tar.gz   # 解压包

在HDFS中下载jdk到当前路径下

[atguigu@hadoop104 software]$ hadoop fs -get  /jdk-8u212-linux-
x64.tar.gz ./

执行jar包程序,输入文件必须在hdfs页面显示,在虚拟机中找不到输入文件路径 ,显示成功

[linux@hadoop106 hadoop-3.2.1]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar wordcount /input /wcoup
2021-04-15 14:10:10,590 INFO client.RMProxy: Connecting to ResourceManager at hadoop107/192.168.148.107:8032
2021-04-15 14:10:11,060 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/linux/.staging/job_1618466527672_0004
2021-04-15 14:10:11,157 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2021-04-15 14:10:11,430 INFO input.FileInputFormat: Total input files to process : 1
2021-04-15 14:10:11,481 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2021-04-15 14:10:11,560 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2021-04-15 14:10:11,628 INFO mapreduce.JobSubmitter: number of splits:1
2021-04-15 14:10:14,373 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2021-04-15 14:10:14,656 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1618466527672_0004
2021-04-15 14:10:14,656 INFO mapreduce.JobSubmitter: Executing with tokens: []
2021-04-15 14:10:16,675 INFO conf.Configuration: resource-types.xml not found
2021-04-15 14:10:16,675 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2021-04-15 14:10:17,526 INFO impl.YarnClientImpl: Submitted application application_1618466527672_0004
2021-04-15 14:10:18,113 INFO mapreduce.Job: The url to track the job: http://hadoop107:8088/proxy/application_1618466527672_0004/
2021-04-15 14:10:18,113 INFO mapreduce.Job: Running job: job_1618466527672_0004
2021-04-15 14:12:17,759 INFO mapreduce.Job: Job job_1618466527672_0004 running in uber mode : false
2021-04-15 14:12:17,762 INFO mapreduce.Job:  map 0% reduce 0%
2021-04-15 14:13:46,597 INFO mapreduce.Job:  map 100% reduce 0%
2021-04-15 14:14:04,818 INFO mapreduce.Job:  map 100% reduce 100%
2021-04-15 14:14:17,575 INFO mapreduce.Job: Job job_1618466527672_0004 completed successfully
2021-04-15 14:14:20,887 INFO mapreduce.Job: Counters: 54
	File System Counters
		FILE: Number of bytes read=70
		FILE: Number of bytes written=452581
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=136
		HDFS: Number of bytes written=44
		HDFS: Number of read operations=8
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
		HDFS: Number of bytes read erasure-coded=0
	Job Counters 
		Launched map tasks=1
		Launched reduce tasks=1
		Data-local map tasks=1
		Total time spent by all maps in occupied slots (ms)=55828
		Total time spent by all reduces in occupied slots (ms)=16491
		Total time spent by all map tasks (ms)=55828
		Total time spent by all reduce tasks (ms)=16491
		Total vcore-milliseconds taken by all map tasks=55828
		Total vcore-milliseconds taken by all reduce tasks=16491
		Total megabyte-milliseconds taken by all map tasks=57167872
		Total megabyte-milliseconds taken by all reduce tasks=16886784
	Map-Reduce Framework
		Map input records=6
		Map output records=5
		Map output bytes=54
		Map output materialized bytes=70
		Input split bytes=101
		Combine input records=5
		Combine output records=5
		Reduce input groups=5
		Reduce shuffle bytes=70
		Reduce input records=5
		Reduce output records=5
		Spilled Records=10
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=263
		CPU time spent (ms)=10930
		Physical memory (bytes) snapshot=432349184
		Virtual memory (bytes) snapshot=5165584384
		Total committed heap usage (bytes)=303562752
		Peak Map Physical memory (bytes)=250654720
		Peak Map Virtual memory (bytes)=2580398080
		Peak Reduce Physical memory (bytes)=181694464
		Peak Reduce Virtual memory (bytes)=2585186304
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=35
	File Output Format Counters 
		Bytes Written=44

容器超时: hadoop107:37539 failed on socket timeout exception: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch :

[linux@hadoop106 hadoop-3.2.1]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar wordcount /wcinput /wcoutput
2021-04-08 18:46:21,580 INFO client.RMProxy: Connecting to ResourceManager at hadoop107/192.168.148.107:8032
2021-04-08 18:47:05,188 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/linux/.staging/job_1617875909848_0001
2021-04-08 18:47:10,374 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2021-04-08 18:47:29,359 INFO input.FileInputFormat: Total input files to process : 1
2021-04-08 18:47:30,918 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2021-04-08 18:47:36,573 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2021-04-08 18:47:39,560 INFO mapreduce.JobSubmitter: number of splits:1
2021-04-08 18:48:14,036 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2021-04-08 18:48:23,893 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1617875909848_0001
2021-04-08 18:48:23,893 INFO mapreduce.JobSubmitter: Executing with tokens: []
2021-04-08 18:48:42,677 INFO conf.Configuration: resource-types.xml not found
2021-04-08 18:48:42,678 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2021-04-08 18:50:42,095 INFO impl.YarnClientImpl: Submitted application application_1617875909848_0001
2021-04-08 18:50:52,398 INFO mapreduce.Job: The url to track the job: http://hadoop107:8088/proxy/application_1617875909848_0001/
2021-04-08 18:50:52,772 INFO mapreduce.Job: Running job: job_1617875909848_0001
2021-04-08 19:00:44,812 INFO mapreduce.Job: Job job_1617875909848_0001 running in uber mode : false
2021-04-08 19:00:50,447 INFO mapreduce.Job:  map 0% reduce 0%
2021-04-08 19:02:12,352 INFO mapreduce.Job: Task Id : attempt_1617875909848_0001_m_000000_0, Status : FAILED
Container launch failed for container_1617875909848_0001_01_000002 : java.net.SocketTimeoutException: Call From hadoop106/192.168.148.106 to hadoop107:37539 failed on socket timeout exception: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.148.106:48026 remote=hadoop107/192.168.148.107:37539]; For more details see:  http://wiki.apache.org/hadoop/SocketTimeout
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:833)
	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:777)
	at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1549)
	at org.apache.hadoop.ipc.Client.call(Client.java:1491)
	at org.apache.hadoop.ipc.Client.call(Client.java:1388)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
	at com.sun.proxy.$Proxy84.startContainers(Unknown Source)
	at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:128)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
	at com.sun.proxy.$Proxy85.startContainers(Unknown Source)
	at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:160)
	at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:394)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.148.106:48026 remote=hadoop107/192.168.148.107:37539]
	at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
	at java.io.FilterInputStream.read(FilterInputStream.java:133)
	at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
	at java.io.DataInputStream.read(DataInputStream.java:149)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
	at java.io.FilterInputStream.read(FilterInputStream.java:83)
	at java.io.FilterInputStream.read(FilterInputStream.java:83)
	at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:567)
	at java.io.DataInputStream.readInt(DataInputStream.java:387)
	at org.apache.hadoop.ipc.Client$IpcStreams.readResponse(Client.java:1850)
	at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1183)
	at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1079)

2021-04-08 19:03:44,526 INFO mapreduce.Job: Task Id : attempt_1617875909848_0001_m_000000_1, Status : FAILED
Container launch failed for container_1617875909848_0001_01_000003 : java.net.SocketTimeoutException: Call From hadoop106/192.168.148.106 to hadoop108:41507 failed on socket timeout exception: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.148.106:37410 remote=hadoop108/192.168.148.108:41507]; For more details see:  http://wiki.apache.org/hadoop/SocketTimeout
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:833)
	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:777)
	at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1549)
	at org.apache.hadoop.ipc.Client.call(Client.java:1491)
	at org.apache.hadoop.ipc.Client.call(Client.java:1388)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
	at com.sun.proxy.$Proxy84.startContainers(Unknown Source)
	at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:128)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
	at com.sun.proxy.$Proxy85.startContainers(Unknown Source)
	at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:160)
	at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:394)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.148.106:37410 remote=hadoop108/192.168.148.108:41507]
	at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
	at java.io.FilterInputStream.read(FilterInputStream.java:133)
	at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
	at java.io.DataInputStream.read(DataInputStream.java:149)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
	at java.io.FilterInputStream.read(FilterInputStream.java:83)
	at java.io.FilterInputStream.read(FilterInputStream.java:83)
	at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:567)
	at java.io.DataInputStream.readInt(DataInputStream.java:387)
	at org.apache.hadoop.ipc.Client$IpcStreams.readResponse(Client.java:1850)
	at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1183)
	at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1079)

2021-04-08 19:06:34,231 INFO mapreduce.Job:  map 12% reduce 0%
2021-04-08 19:11:30,955 INFO mapreduce.Job:  map 100% reduce 0%
2021-04-08 19:28:12,319 INFO mapreduce.Job:  map 100% reduce 67%
2021-04-08 19:50:19,276 INFO mapreduce.Job:  map 100% reduce 100%
2021-04-08 19:51:17,544 INFO mapreduce.Job: Job job_1617875909848_0001 completed successfully
2021-04-08 19:51:41,259 INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2021-04-08 19:57:14,868 INFO mapreduce.Job: Counters: 56
	File System Counters
		FILE: Number of bytes read=1028866050
		FILE: Number of bytes written=1539402871
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=189756380
		HDFS: Number of bytes written=488209962
		HDFS: Number of read operations=8
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
		HDFS: Number of bytes read erasure-coded=0
	Job Counters 
		Failed map tasks=2
		Launched map tasks=3
		Launched reduce tasks=1
		Other local map tasks=2
		Data-local map tasks=1
		Total time spent by all maps in occupied slots (ms)=470947
		Total time spent by all reduces in occupied slots (ms)=2322365
		Total time spent by all map tasks (ms)=470947
		Total time spent by all reduce tasks (ms)=2322365
		Total vcore-milliseconds taken by all map tasks=470947
		Total vcore-milliseconds taken by all reduce tasks=2322365
		Total megabyte-milliseconds taken by all map tasks=482249728
		Total megabyte-milliseconds taken by all reduce tasks=2378101760
	Map-Reduce Framework
		Map input records=3337249
		Map output records=9030245
		Map output bytes=574459507
		Map output materialized bytes=510084358
		Input split bytes=121
		Combine input records=14090733
		Combine output records=9826424
		Reduce input groups=4765936
		Reduce shuffle bytes=510084358
		Reduce input records=4765936
		Reduce output records=4765936
		Spilled Records=14592360
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=72588
		CPU time spent (ms)=631840
		Physical memory (bytes) snapshot=959533056
		Virtual memory (bytes) snapshot=5185523712
		Total committed heap usage (bytes)=803209216
		Peak Map Physical memory (bytes)=525352960
		Peak Map Virtual memory (bytes)=2590576640
		Peak Reduce Physical memory (bytes)=434180096
		Peak Reduce Virtual memory (bytes)=2606690304
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=189756259
	File Output Format Counters 
		Bytes Written=488209962

hadoop 3.x常用端口
HDFS NameNode内部通信端口:8020/9000/9820
NameNode对外/用户端口：9870
YARN MapReduce查看任务端口:8088
历史历史服务器通信端口:19888

hadoop 2.x常用端口
HDFS NameNode内部通信端口:8020/9000
NameNode对外/用户端口：50070
YARN MapReduce查看任务端口:8088
历史历史服务器通信端口:19888

集群时间同步内网有待商榷 ntp或chrony有外网无需设置时间同步

[root@hadoop106 linux]# systemctl status ntpd   #或chronyd  查看ntp或chrony服务状态
[root@hadoop106 linux]# systemctl start ntpd    #或chronyd  开启ntp或chrony服务
[root@hadoop106 linux]# systemctl is-enabled ntpd    #或chronyd  开机自启动ntp或chrony服务状态
[root@hadoop106 linux]#vim /etc/ntp.conf   #或chrony.conf  更改配置

[linux@hadoop107 ~]$ sudo date -s "2021-9-11 11:11:11"   #更改107时间
[linux@hadoop107 ~]$ date   #查看时间

常见错误
1）防火墙没关闭、或者没有启动 YARN
INFO client.RMProxy: Connecting to ResourceManager at hadoop108/192.168.10.108:8032
2）主机名称配置错误
3）IP地址配置错误
4）ssh没有配置好
5）root用户和 atguigu两个用户启动集群不统一
6）配置文件修改不细心
7）不识别主机名称

java.net.UnknownHostException: hadoop102: hadoop102
at
java.net.InetAddress.getLocalHost(InetAddress.java:1475)
at
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(Job
Submitter.java:146)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at
Method)
java.security.AccessController.doPrivileged(Native
at javax.security.auth.Subject.doAs(Subject.java:415)

解决办法：
（1）在/etc/hosts文件中添加 192.168.10.102 hadoop102
（2）主机名称不要起 hadoop hadoop000等特殊名称
8）DataNode和 NameNode进程同时只能工作一个。
DataNode和NameNode进程同时只能有一个工作问题分析
1）NameNode在format初始化后会生成clusterId（集群id）NameNode
3）再次格式化NameNode，生成新的clusterid，与未删除DataNode的clusterid
2）DataNode在启动后也会生成和NameNode一样的clusterId（集群id）不一致新NameNode
4）解决办法：在格式化之前，先删除DataNode里面的信息（默认在/data/tmp，如果配置了该目录，那就去你配置的目录下删除数据）
DataNode1
DataNode2
DataNode3
9）执行命令不生效，粘贴 Word中命令时，遇到-和长–没区分开。导致命令失效
解决办法：尽量不要粘贴 Word中代码。
10）jps发现进程已经没有，但是重新启动集群，提示进程已经开启。
原因是在 Linux的根目录下/tmp目录中存在启动的进程临时文件，将集群相关进程删除掉，再重新启动集群。
11）jps不生效
原因：全局变量 hadoop java没有生效。解决办法：需要 source /etc/profile文件。
12）8088端口连接不上
[atguigu@hadoop102桌面]$ cat /etc/hosts
注释掉如下代码
#127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
#::1 hadoop102

weixin_54897697

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
hadoop完全分布式环境搭建

1）准备 3台客户机（关闭防火墙、静态 IP、主机名称）2) 配置 ssh3）安装 JDK4）安装 Hadoop5）配置环境变量6）配置集群7）单点启动8）群起并测试集群1.安装VM2.配置网络环境可自己更改子网IP需手动设置IP地址VM网络环境搭建完成配置要求:3台虚拟机 192.168.148.106/107/1082.搭建一台虚拟机3.配置虚拟机环境更改ip地址[root@hadoop105 ~]# vim /etc/sysconfig/network-
复制链接

扫一扫