本虚拟机使用的是VM15.5版本 CentSO 8 系统 jdk-8u161-linux-x64.tar.gz hadoop3.2.1版本
1)准备 3台客户机(关闭防火墙、静态 IP、主机名称)
2) 配置 ssh
3)安装 JDK
4)安装 Hadoop
5)配置环境变量
6)配置集群
7)单点启动
8)群起并测试集群
1.安装VM
2.配置网络环境
可自己更改子网IP
需手动设置IP地址
VM网络环境搭建完成
配置要求:3台虚拟机 192.168.148.106/107/108
2.搭建一台虚拟机
3.配置虚拟机环境
更改ip地址
[root@hadoop105 ~]# vim /etc/sysconfig/network-scripts/ifcfg-ens33 #配置网络
TYPE="Ethernet"
PROXY_METHOD="none"
BROWSER_ONLY="no"
BOOTPROTO="static" #改为固定IP
DEFROUTE="yes"
IPV4_FAILURE_FATAL="no"
IPV6INIT="yes"
IPV6_AUTOCONF="yes"
IPV6_DEFROUTE="yes"
IPV6_FAILURE_FATAL="no"
IPV6_ADDR_GEN_MODE="stable-privacy"
NAME="ens33"
UUID="6e807274-47d5-459d-b513-b6f2ce9b0df3"
DEVICE="ens33"
ONBOOT="yes"
IPADDR=192.168.148.105 #添加IP地址及网关
GATEWAY=192.168.148.2
DNS1=192.168.148.2
更改主机名称
[root@hadoop105 ~]# vim /etc/hostname
hadoop105 #改为hadoop105
添加映射关系
[root@hadoop105 ~]# vim /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.148.105 hadoop105 #添加映射
192.168.148.106 hadoop106
192.168.148.107 hadoop107
192.168.148.108 hadoop108
重启虚拟机 reboot
Xshell 7开启远程登陆(使用IP地址登陆)----配置主机映射(使用用户名登陆)C:\Windows\System32\drivers\etc\hosts文件(若文件无法打开,需拖至桌面在打开,更改后替换源文件) 添加映射关系(即上面映射关系)
在虚拟机内安装软件包 yum install -y epel-release
关闭防火墙
[root@hadoop105 ~]# systemctl stop firewalld #关闭防火墙
[root@hadoop105 ~]# systemctl disable firewalld.service #开机关闭防火墙
Removed /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
添加普通用户(前期已添加) #可省略
[root@hadoop100 ~]# useradd atguigu
[root@hadoop100 ~]# passwd atguigu
配置普通用户linux具备root权限,方便后期sudo执行
[root@hadoop105 ~]# vim /etc/sudoers
## Allows people in group wheel to run all commands
%wheel ALL=(ALL) ALL
linux ALL=(ALL) NOPASSWD:ALL #添加内容
创建文件夹并更改文件所属主,所属组
[root@hadoop105 opt]# mkdir module 创建文件
[root@hadoop105 opt]# mkdir software
[root@hadoop105 opt]# ll
总用量 0
drwxr-xr-x. 2 root root 6 4月 6 22:54 module
drwxr-xr-x. 2 root root 6 4月 6 22:54 software
[root@hadoop105 opt]# cd
[root@hadoop105 ~]# chown linux:linux /opt/module 更改所属主
[root@hadoop105 ~]# chown linux:linux /opt/software
[root@hadoop105 ~]# cd /opt/
[root@hadoop105 opt]# ll
总用量 0
drwxr-xr-x. 2 linux linux 6 4月 6 22:54 module
drwxr-xr-x. 2 linux linux 6 4月 6 22:54 software
重启虚拟机 reboot
克隆虚拟机106,107,108
关闭虚拟机105 完整克隆106,107,108
依次打开虚拟机106,107,108并更改IP地址及主机名
[root@hadoop105 ~]# vim /etc/sysconfig/network-scripts/ifcfg-ens33
[root@hadoop105 ~]# vim /etc/hostname
[root@hadoop105 ~]# vim /etc/hosts
设置免密登陆,分别在106,107,108设置一遍,然后再root用户106下再设置一遍
[linux@hadoop106 ~]$ ll -al
总用量 28
drwx------. 2 linux linux 25 4月 7 22:06 .ssh
-rw-------. 1 linux linux 1654 4月 8 15:02 .viminfo
-rw-------. 1 linux linux 220 4月 8 14:30 .Xauthority
[linux@hadoop106 ~]$ cd .ssh
[linux@hadoop106 .ssh]$ ll
总用量 4
-rw-r--r--. 1 linux linux 561 4月 8 15:06 known_hosts
[linux@hadoop106 .ssh]$ ssh-keygen -t rsa #读取密匙 三次回车
Generating public/private rsa key pair.
Enter file in which to save the key (/home/linux/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/linux/.ssh/id_rsa.
Your public key has been saved in /home/linux/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:82i/VEGhT0YdrcNSwiZ7jcABzcY+gLLeJ23Cl5bChm8 linux@hadoop106
The key's randomart image is:
+---[RSA 2048]----+
| ..*.o+o.o |
| . . . O++ o .|
| o +.=+* . |
| . +++.= |
| . = .So oo. . |
| o O O+ . |
| o Oo o |
| E. o |
| . o. |
+----[SHA256]-----+
[linux@hadoop106 .ssh]$ ll
总用量 12
-rw-------. 1 linux linux 1823 4月 8 15:28 id_rsa
-rw-r--r--. 1 linux linux 397 4月 8 15:28 id_rsa.pub
-rw-r--r--. 1 linux linux 561 4月 8 15:06 known_hosts
[linux@hadoop106 .ssh]$ ssh-copy-id hadoop107 #发送公匙给107
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/linux/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
linux@hadoop107's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'hadoop107'"
and check to make sure that only the key(s) you wanted were added.
[linux@hadoop106 .ssh]$ ssh hadoop107
Activate the web console with: systemctl enable --now cockpit.socket
Last failed login: Thu Apr 8 15:16:16 CST 2021 from 192.168.148.106 on ssh:notty
There was 1 failed login attempt since the last successful login.
Last login: Thu Apr 8 14:30:13 2021 from 192.168.148.1
[linux@hadoop107 ~]$ exit
注销
Connection to hadoop107 closed.
[linux@hadoop106 .ssh]$ ssh-copy-id hadoop108 #发送公匙给108
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/linux/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
linux@hadoop108's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'hadoop108'"
and check to make sure that only the key(s) you wanted were added.
[linux@hadoop106 .ssh]$ ssh-copy-id hadoop106 #发送公匙给106
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/linux/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
linux@hadoop106's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'hadoop106'"
and check to make sure that only the key(s) you wanted were added.
[linux@hadoop106 .ssh]$ ll
总用量 16
-rw-------. 1 linux linux 397 4月 8 15:31 authorized_keys
-rw-------. 1 linux linux 1823 4月 8 15:28 id_rsa
-rw-r--r--. 1 linux linux 397 4月 8 15:28 id_rsa.pub
-rw-r--r--. 1 linux linux 561 4月 8 15:06 known_hosts
上传安装包方式1:
Xshell 7开启3台虚拟机的远程登陆,使用Xftp 7在106上传jdk和hadoop安装包
上传安装包方式2:
Xshell 7开启3台虚拟机的远程登陆,在106虚拟机 yum -y install lrzsz 安装rz服务,在文件夹下使用***rz命令***上传jdk和hadoop安装包
使用ll命令在文件夹下查看上传安装包
解压安装包至另一文件夹下
[linux@hadoop106 software]$ ll
总用量 536092
-rw-rw-r--. 1 linux linux 359196911 4月 7 19:51 hadoop-3.2.1.tar.gz
-rw-rw-r--. 1 linux linux 189756259 4月 7 19:49 jdk-8u161-linux-x64.tar.gz
[linux@hadoop106 software]$ tar -zxvf jdk-8u161-linux-x64.tar.gz -C /opt/module/ #解压jdk安装包
[linux@hadoop106 software]$ tar -zxvf hadoop-3.2.1.tar.gz -C /opt/module/
#解压hadoop安装包
复制或拷贝 三种方式:
发送: 把106的jdk,hadoop发送给107/108
[linux@hadoop106 module]$ scp -r jdk1.8.0_161/ linux@hadoop107:/opt/module/
拉取: 把106的jdk,hadoop拉取到107/108
[linux@hadoop107 module]$ scp -r linux@hadoop106:/opt/module/hadoop-3.2.1 ./
[linux@hadoop107 module]$ ll
总用量 0
drwxr-xr-x. 11 linux linux 180 4月 7 22:45 hadoop-3.2.1
drwxr-xr-x. 8 linux linux 255 4月 7 22:09 jdk1.8.0_161
发送: 在107上,把106的jdk,hadoop发送到108
[linux@hadoop107 ~]$ scp -r linux@hadoop106:/opt/module/* linux@hadoop108:/opt/module/
配置环境变量
[linux@hadoop106 jdk1.8.0_161]$ cd /etc/profile.d/
[linux@hadoop106 profile.d]$ sudo vim my_env.sh #创建my_env.sh文件并写入变量
[linux@hadoop106 profile.d]$ source /etc/profile #刷新环境变量
[linux@hadoop106 profile.d]$ java #查看Java环境变量
[linux@hadoop106 module]$ cd hadoop-3.2.1/
[linux@hadoop106 hadoop-3.2.1]$ sudo vim /etc/profile.d/my_env.sh
#JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.8.0_161
export PATH=$PATH:$JAVA_HOME/bin
#HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop-3.2.1
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
设置集群分发脚本
[linux@hadoop106 ~]$ mkdir bin
[linux@hadoop106 ~]$ ll -al
总用量 28
drwxrwxr-x. 2 linux linux 6 4月 8 14:48 bin
drwx------. 3 linux linux 19 4月 6 22:28 .config
-rw-------. 1 linux linux 16 4月 6 22:28 .esd_auth
drwx------. 2 linux linux 25 4月 7 22:06 .ssh #此处省略部分
[linux@hadoop106 ~]$ cd bin/
[linux@hadoop106 bin]$ vim xsync #以下为分发脚本内容
#!/bin/bash
#1.判断参数个数
if [ $# -lt 1 ]
then
echo Not Enough Arguement!
exit;
fi
#2.遍历集群所有机器
for host in hadoop106 hadoop107 hadoop108
do
echo ==================== $host ====================
#3.遍历所有目录,挨个发送
for file in $@
do
#4.判断文件是否存在
if [ -e $file ]
then
#5.获取父目录
pdir=$(cd -P $(dirname $file); pwd)
#6.获取当前文件的名称
fname=$(basename $file)
ssh $host "mkdir -p $pdir"
rsync -av $pdir/$fname $host:$pdir
else
echo $file does not exists!
fi
done
done
[linux@hadoop106 bin]$ ll
总用量 4
-rw-rw-r--. 1 linux linux 579 4月 8 15:02 xsync
[linux@hadoop106 bin]$ chmod 777 xsync #更改分发脚本权限
[linux@hadoop106 bin]$ ll
总用量 4
-rwxrwxrwx. 1 linux linux 579 4月 8 15:02 xsync
分发环境变量脚本并在107,108运行
[linux@hadoop106 ~]$ sudo ./bin/xsync /etc/profile.d/my_env.sh
[linux@hadoop107 ~]$ source /etc/profile
[linux@hadoop107 ~]$ java
本地模式测试
[linux@hadoop106 hadoop-3.2.1]$ mkdir wcinput #创建wcinput文件
[linux@hadoop106 hadoop-3.2.1]$ ll
总用量 180
drwxr-xr-x. 2 linux linux 203 9月 11 2019 bin
drwxr-xr-x. 3 linux linux 20 9月 10 2019 etc
drwxr-xr-x. 2 linux linux 106 9月 11 2019 include
drwxr-xr-x. 3 linux linux 20 9月 11 2019 lib
drwxr-xr-x. 4 linux linux 288 9月 11 2019 libexec
-rw-rw-r--. 1 linux linux 150569 9月 10 2019 LICENSE.txt
-rw-rw-r--. 1 linux linux 22125 9月 10 2019 NOTICE.txt
-rw-rw-r--. 1 linux linux 1361 9月 10 2019 README.txt
drwxr-xr-x. 3 linux linux 4096 9月 10 2019 sbin
drwxr-xr-x. 4 linux linux 31 9月 11 2019 share
drwxrwxr-x. 2 linux linux 6 4月 7 20:41 wcinput
[linux@hadoop106 hadoop-3.2.1]$ cd wcinput/
[linux@hadoop106 wcinput]$ vim word.txt #创建并编辑Word.txt文本
[linux@hadoop106 wcinput]$ cd ..
[linux@hadoop106 hadoop-3.2.1]$ pwd
/opt/module/hadoop-3.2.1
[linux@hadoop106 hadoop-3.2.1]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar wordcount wcinput/ ./wcoutput #运行jar包里的wordcount文本 在wcinput输入,wcoutput输出
查看运行数据
[linux@hadoop106 hadoop-3.2.1]$ ll
总用量 180
drwxr-xr-x. 2 linux linux 203 9月 11 2019 bin
drwxr-xr-x. 3 linux linux 20 9月 10 2019 etc
drwxr-xr-x. 2 linux linux 106 9月 11 2019 include
drwxr-xr-x. 3 linux linux 20 9月 11 2019 lib
drwxr-xr-x. 4 linux linux 288 9月 11 2019 libexec
-rw-rw-r--. 1 linux linux 150569 9月 10 2019 LICENSE.txt
-rw-rw-r--. 1 linux linux 22125 9月 10 2019 NOTICE.txt
-rw-rw-r--. 1 linux linux 1361 9月 10 2019 README.txt
drwxr-xr-x. 3 linux linux 4096 9月 10 2019 sbin
drwxr-xr-x. 4 linux linux 31 9月 11 2019 share
drwxrwxr-x. 2 linux linux 22 4月 7 20:43 wcinput
drwxr-xr-x. 2 linux linux 88 4月 7 20:49 wcoutput
[linux@hadoop106 hadoop-3.2.1]$ cd wcoutput/
[linux@hadoop106 wcoutput]$ ll
总用量 4
-rw-r--r--. 1 linux linux 63 4月 7 20:49 part-r-00000 #数据
-rw-r--r--. 1 linux linux 0 4月 7 20:49 _SUCCESS #标记
[linux@hadoop106 wcoutput]$ cat part-r-00000 #查看数据
lisi 1
qwer 1
ss 1
wangba 1
wangwu 1
wangxiaoqing 1
zhangsan 1
重复执行后报错org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/opt/module/hadoop-3.2.1/wcoutput already exists 此语句说明文件已存在输出路径wcoutput中
[linux@hadoop106 hadoop-3.2.1]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar wordcount wcinput/ ./wcoutput
2021-04-07 20:54:54,758 INFO impl.MetricsConfig: Loaded properties from hadoop-metrics2.properties
2021-04-07 20:54:55,060 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2021-04-07 20:54:55,061 INFO impl.MetricsSystemImpl: JobTracker metrics system started
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/opt/module/hadoop-3.2.1/wcoutput already exists
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:164)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:277)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:143)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1588)
at org.apache.hadoop.examples.WordCount.main(WordCount.java:87)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
完全分布式搭建
集群配置 3.X独有的最后四个配置
core-site.xml、hdfs-site.xml、yarn-site.xml、mapred-site.xml、workers、hadoop-env.sh、yarn-env.sh、mapred-env.sh、start-dfs.sh, stop-dfs.sh、start-yarn.sh和stop-yarn.sh
[linux@hadoop106 hadoop-3.2.1]$ cd etc/hadoop/
[linux@hadoop106 hadoop]$ ll
总用量 172
-rw-r--r--. 1 linux linux 774 9月 10 2019 core-site.xml
-rw-r--r--. 1 linux linux 775 9月 11 2019 hdfs-site.xml
-rw-r--r--. 1 linux linux 951 9月 11 2019 mapred-env.cmd
-rw-r--r--. 1 linux linux 1764 9月 11 2019 mapred-env.sh
-rw-r--r--. 1 linux linux 4113 9月 11 2019 mapred-queues.xml.template
-rw-r--r--. 1 linux linux 758 9月 11 2019 mapred-site.xml
-rw-r--r--. 1 linux linux 10 9月 10 2019 workers
-rw-r--r--. 1 linux linux 2250 9月 11 2019 yarn-env.cmd
-rw-r--r--. 1 linux linux 6056 9月 11 2019 yarn-env.sh
-rw-r--r--. 1 linux linux 2591 9月 11 2019 yarnservice-log4j.properties
-rw-r--r--. 1 linux linux 690 9月 11 2019 yarn-site.xml
[linux@hadoop106 hadoop]$ vim core-site.xml
[linux@hadoop106 hadoop]$ vim hdfs-site.xml
[linux@hadoop106 hadoop]$ vim yarn-site.xml
[linux@hadoop106 hadoop]$ vim mapred-site.xml
[linux@hadoop106 hadoop]$ vim workers
[linux@hadoop106 hadoop]$ vim hadoop-env.sh #必须加,不然进程停止易出问题,只能被杀死
[linux@hadoop106 hadoop]$ vim yarn-env.sh #必须加,不然进程停止易出问题,只能被杀死
[linux@hadoop106 hadoop]$ vim mapred-env.sh #没确定他的用处,暂时可以不加
[linux@hadoop106 sbin]$ vim start-dfs.sh #3.x版本特有的
[linux@hadoop106 sbin]$ vim stop-dfs.sh
[linux@hadoop106 sbin]$ vim start-yarn.sh
[linux@hadoop106 sbin]$ vim stop-yarn.sh
[linux@hadoop106 hadoop]$ xsync /opt/module/hadoop-3.2.1/etc/
[linux@hadoop106 hadoop-3.2.1]$ xsync /opt/module/hadoop-3.2.1/sbin/ #分发配置文件
core-site.xml
<configuration>
<!--指定NameNode的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop106:8020</value>
</property>
<!--指定hadoop数据的存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/hadoop-3.2.1/data</value>
</property>
<!--配置HDFS网页登录使用的静态用户为linux -->
<property>
<name>hadoop.http.staticuser.user</name>
<value>linux</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<!-- nn web端访问地址-->
<property>
<name>dfs.namenode.http-address</name>
<value>hadoop106:9870</value>
</property>
<!-- 2nn web端访问地址-->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop108:9868</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<!--指定MR走shuffle -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!--指定ResourceManager的地址-->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop107</value>
</property>
<!--环境变量的继承 -->
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
<!--开启日志聚集功能 -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!--设置日志聚集服务器地址 -->
<property>
<name>yarn.log.server.url</name>
<value>http://hadoop106:19888/jobhistory/logs</value>
</property>
<!--设置日志保留时间为7天 -->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
</configuration>
mapred-site.xml
<!--指定MapReduce程序运行在Yarn上 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=/opt/module/hadoop-3.2.1</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=/opt/module/hadoop-3.2.1</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=/opt/module/hadoop-3.2.1</value>
</property>
<!--历史服务器端地址 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop106:10020</value>
</property>
<!--历史服务器web端地址 -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop106:19888</value>
</property>
workers
hadoop102
hadoop103
hadoop104
hadoop-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_161 #在文件最底层添加JAVA路径
yarn-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_161 #在文件最底层添加JAVA路径
mapred-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_161 #在文件最底层添加JAVA路径
start-dfs.sh, stop-dfs.sh
HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
start-yarn.sh和stop-yarn.sh
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root
格式化并启动集群
[linux@hadoop106 hadoop-3.2.1]$ hdfs namenode -format #格式化
[linux@hadoop106 hadoop-3.2.1]$ sbin/start-dfs.sh #启动dfs
[linux@hadoop107 hadoop-3.2.1]$ sbin/start-yarn.sh #启动yarn
[linux@hadoop106 hadoop-3.2.1]$ mapred --daemon start historyserver #启动历史服务器
集群的启动/停止
HDFS: start-dfs.sh/stop-dfs.sh
YARN: start-yarn.sh/stop-yarn.sh
单个组件的启动/停止
hdfs --daemon start/stop namenode/datanode/secondarynamenode
yarn --daemon start/stop resourcemanager/nodemanager```
设置集群启动/停止脚本
[linux@hadoop106 ~]$ cd bin/
[linux@hadoop106 bin]$ ll
总用量 4
-rwxrwxrwx. 1 linux linux 579 4月 8 15:02 xsync
[linux@hadoop106 bin]$ vim myhadoop.sh #创建脚本
[linux@hadoop106 bin]$ chmod 777 myhadoop.sh #更改权限
[linux@hadoop106 bin]$ ll
总用量 8
-rwxrwxrwx. 1 linux linux 1053 4月 8 23:06 myhadoop.sh
-rwxrwxrwx. 1 linux linux 579 4月 8 15:02 xsync
[linux@hadoop106 bin]$ myhadoop.sh stop #关闭集群
[linux@hadoop106 bin]$ myhadoop.sh start #开启集群
集群启动/停止脚本内容
#!/bin/bash
if [ $# -lt 1 ]
then
echo "No Args Input..."
exit ;
fi
case $1 in
"start")
echo " ===================启动 hadoop集群 ==================="
echo " ---------------启动 hdfs ---------------"
ssh hadoop102 "/opt/module/hadoop-3.2.1/sbin/start-dfs.sh"
echo " ---------------启动 yarn ---------------"
ssh hadoop103 "/opt/module/hadoop-3.2.1/sbin/start-yarn.sh"
echo " ---------------启动 historyserver ---------------"
ssh hadoop102 "/opt/module/hadoop-3.2.1/bin/mapred --daemon start historyserver"
;;
"stop")
echo " ===================关闭 hadoop集群 ==================="
echo " ---------------关闭 historyserver ---------------"
ssh hadoop102 "/opt/module/hadoop-3.2.1/bin/mapred --daemon stop historyserver"
echo " ---------------关闭 yarn ---------------"
ssh hadoop103 "/opt/module/hadoop-3.2.1/sbin/stop-yarn.sh"
echo " ---------------关闭 hdfs ---------------"
ssh hadoop102 "/opt/module/hadoop-3.2.1/sbin/stop-dfs.sh"
;;
*)
echo "Input Args Error..."
;;
esac
设置查看进程脚本
[linux@hadoop106 bin]$ vim jpsall #创建文本
[linux@hadoop106 bin]$ chmod 777 jpsall #更改权限
[linux@hadoop106 bin]$ ll
总用量 12
-rwxrwxrwx. 1 linux linux 122 4月 9 00:05 jpsall
-rwxrwxrwx. 1 linux linux 1025 4月 8 23:23 myhadoop.sh
-rwxrwxrwx. 1 linux linux 579 4月 8 15:02 xsync
[linux@hadoop106 bin]$ jpsall #查看进程
jps脚本内容
#!/bin/bash
for host in hadoop106 hadoop107 hadoop108
do
echo =============== $host ===============
ssh $host jps
done
分发自定义脚本,保证自定义脚本在三台机器上都可以使用
[linux@hadoop106 bin]$ sxync /home/linux/bin/
使用shell上传文件或安装包至web端
[linux@hadoop106 hadoop-3.2.1]$ hadoop fs -mkdir /input #创建文件
[linux@hadoop106 hadoop-3.2.1]$ hadoop fs -put wcinput/word.txt /input
#上传文件word 在wcinpit文件中输入 ,input文件下输出
在shell端查看上传文件在HDFS中的存储路径
[linux@hadoop106 subdir0]$ pwd
/opt/module/hadoop-3.2.1/data/dfs/data/current/BP-2096428802-192.168.148.106-1617875461452/current/finalized/subdir0/subdir0
[linux@hadoop106 subdir0]$ ll
总用量 186772
-rw-rw-r--. 1 linux linux 49 4月 8 18:15 blk_1073741826
-rw-rw-r--. 1 linux linux 11 4月 8 18:15 blk_1073741826_1002.meta
-rw-rw-r--. 1 linux linux 134217728 4月 8 18:25 blk_1073741827
-rw-rw-r--. 1 linux linux 1048583 4月 8 18:25 blk_1073741827_1004.meta
-rw-rw-r--. 1 linux linux 55538531 4月 8 18:27 blk_1073741828
-rw-rw-r--. 1 linux linux 433903 4月 8 18:27 blk_1073741828_1005.meta
[linux@hadoop106 subdir0]$ cat blk_1073741826 #查看内容
ss
qwer
wangxiaoqing
zhangsan
lisi
wangwu
wangba
内容拼接到一个包
[linux@hadoop106 subdir0]$ cat blk_1073741836>>tmp.tar.gz
[linux@hadoop106 subdir0]$ cat blk_1073741837>>tmp.tar.gz
[linux@hadoop106 subdir0]$ tar -zxvf tmp.tar.gz # 解压包
在HDFS中下载jdk到当前路径下
[atguigu@hadoop104 software]$ hadoop fs -get /jdk-8u212-linux-
x64.tar.gz ./
执行jar包程序,输入文件必须在hdfs页面显示,在虚拟机中找不到输入文件路径 ,显示成功
[linux@hadoop106 hadoop-3.2.1]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar wordcount /input /wcoup
2021-04-15 14:10:10,590 INFO client.RMProxy: Connecting to ResourceManager at hadoop107/192.168.148.107:8032
2021-04-15 14:10:11,060 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/linux/.staging/job_1618466527672_0004
2021-04-15 14:10:11,157 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2021-04-15 14:10:11,430 INFO input.FileInputFormat: Total input files to process : 1
2021-04-15 14:10:11,481 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2021-04-15 14:10:11,560 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2021-04-15 14:10:11,628 INFO mapreduce.JobSubmitter: number of splits:1
2021-04-15 14:10:14,373 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2021-04-15 14:10:14,656 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1618466527672_0004
2021-04-15 14:10:14,656 INFO mapreduce.JobSubmitter: Executing with tokens: []
2021-04-15 14:10:16,675 INFO conf.Configuration: resource-types.xml not found
2021-04-15 14:10:16,675 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2021-04-15 14:10:17,526 INFO impl.YarnClientImpl: Submitted application application_1618466527672_0004
2021-04-15 14:10:18,113 INFO mapreduce.Job: The url to track the job: http://hadoop107:8088/proxy/application_1618466527672_0004/
2021-04-15 14:10:18,113 INFO mapreduce.Job: Running job: job_1618466527672_0004
2021-04-15 14:12:17,759 INFO mapreduce.Job: Job job_1618466527672_0004 running in uber mode : false
2021-04-15 14:12:17,762 INFO mapreduce.Job: map 0% reduce 0%
2021-04-15 14:13:46,597 INFO mapreduce.Job: map 100% reduce 0%
2021-04-15 14:14:04,818 INFO mapreduce.Job: map 100% reduce 100%
2021-04-15 14:14:17,575 INFO mapreduce.Job: Job job_1618466527672_0004 completed successfully
2021-04-15 14:14:20,887 INFO mapreduce.Job: Counters: 54
File System Counters
FILE: Number of bytes read=70
FILE: Number of bytes written=452581
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=136
HDFS: Number of bytes written=44
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
HDFS: Number of bytes read erasure-coded=0
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=55828
Total time spent by all reduces in occupied slots (ms)=16491
Total time spent by all map tasks (ms)=55828
Total time spent by all reduce tasks (ms)=16491
Total vcore-milliseconds taken by all map tasks=55828
Total vcore-milliseconds taken by all reduce tasks=16491
Total megabyte-milliseconds taken by all map tasks=57167872
Total megabyte-milliseconds taken by all reduce tasks=16886784
Map-Reduce Framework
Map input records=6
Map output records=5
Map output bytes=54
Map output materialized bytes=70
Input split bytes=101
Combine input records=5
Combine output records=5
Reduce input groups=5
Reduce shuffle bytes=70
Reduce input records=5
Reduce output records=5
Spilled Records=10
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=263
CPU time spent (ms)=10930
Physical memory (bytes) snapshot=432349184
Virtual memory (bytes) snapshot=5165584384
Total committed heap usage (bytes)=303562752
Peak Map Physical memory (bytes)=250654720
Peak Map Virtual memory (bytes)=2580398080
Peak Reduce Physical memory (bytes)=181694464
Peak Reduce Virtual memory (bytes)=2585186304
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=35
File Output Format Counters
Bytes Written=44
容器超时: hadoop107:37539 failed on socket timeout exception: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch :
[linux@hadoop106 hadoop-3.2.1]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar wordcount /wcinput /wcoutput
2021-04-08 18:46:21,580 INFO client.RMProxy: Connecting to ResourceManager at hadoop107/192.168.148.107:8032
2021-04-08 18:47:05,188 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/linux/.staging/job_1617875909848_0001
2021-04-08 18:47:10,374 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2021-04-08 18:47:29,359 INFO input.FileInputFormat: Total input files to process : 1
2021-04-08 18:47:30,918 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2021-04-08 18:47:36,573 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2021-04-08 18:47:39,560 INFO mapreduce.JobSubmitter: number of splits:1
2021-04-08 18:48:14,036 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2021-04-08 18:48:23,893 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1617875909848_0001
2021-04-08 18:48:23,893 INFO mapreduce.JobSubmitter: Executing with tokens: []
2021-04-08 18:48:42,677 INFO conf.Configuration: resource-types.xml not found
2021-04-08 18:48:42,678 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2021-04-08 18:50:42,095 INFO impl.YarnClientImpl: Submitted application application_1617875909848_0001
2021-04-08 18:50:52,398 INFO mapreduce.Job: The url to track the job: http://hadoop107:8088/proxy/application_1617875909848_0001/
2021-04-08 18:50:52,772 INFO mapreduce.Job: Running job: job_1617875909848_0001
2021-04-08 19:00:44,812 INFO mapreduce.Job: Job job_1617875909848_0001 running in uber mode : false
2021-04-08 19:00:50,447 INFO mapreduce.Job: map 0% reduce 0%
2021-04-08 19:02:12,352 INFO mapreduce.Job: Task Id : attempt_1617875909848_0001_m_000000_0, Status : FAILED
Container launch failed for container_1617875909848_0001_01_000002 : java.net.SocketTimeoutException: Call From hadoop106/192.168.148.106 to hadoop107:37539 failed on socket timeout exception: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.148.106:48026 remote=hadoop107/192.168.148.107:37539]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:833)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:777)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1549)
at org.apache.hadoop.ipc.Client.call(Client.java:1491)
at org.apache.hadoop.ipc.Client.call(Client.java:1388)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
at com.sun.proxy.$Proxy84.startContainers(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:128)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy85.startContainers(Unknown Source)
at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:160)
at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:394)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.148.106:48026 remote=hadoop107/192.168.148.107:37539]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at java.io.DataInputStream.read(DataInputStream.java:149)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:567)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at org.apache.hadoop.ipc.Client$IpcStreams.readResponse(Client.java:1850)
at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1183)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1079)
2021-04-08 19:03:44,526 INFO mapreduce.Job: Task Id : attempt_1617875909848_0001_m_000000_1, Status : FAILED
Container launch failed for container_1617875909848_0001_01_000003 : java.net.SocketTimeoutException: Call From hadoop106/192.168.148.106 to hadoop108:41507 failed on socket timeout exception: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.148.106:37410 remote=hadoop108/192.168.148.108:41507]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:833)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:777)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1549)
at org.apache.hadoop.ipc.Client.call(Client.java:1491)
at org.apache.hadoop.ipc.Client.call(Client.java:1388)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
at com.sun.proxy.$Proxy84.startContainers(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:128)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy85.startContainers(Unknown Source)
at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:160)
at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:394)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.148.106:37410 remote=hadoop108/192.168.148.108:41507]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at java.io.DataInputStream.read(DataInputStream.java:149)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:567)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at org.apache.hadoop.ipc.Client$IpcStreams.readResponse(Client.java:1850)
at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1183)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1079)
2021-04-08 19:06:34,231 INFO mapreduce.Job: map 12% reduce 0%
2021-04-08 19:11:30,955 INFO mapreduce.Job: map 100% reduce 0%
2021-04-08 19:28:12,319 INFO mapreduce.Job: map 100% reduce 67%
2021-04-08 19:50:19,276 INFO mapreduce.Job: map 100% reduce 100%
2021-04-08 19:51:17,544 INFO mapreduce.Job: Job job_1617875909848_0001 completed successfully
2021-04-08 19:51:41,259 INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2021-04-08 19:57:14,868 INFO mapreduce.Job: Counters: 56
File System Counters
FILE: Number of bytes read=1028866050
FILE: Number of bytes written=1539402871
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=189756380
HDFS: Number of bytes written=488209962
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
HDFS: Number of bytes read erasure-coded=0
Job Counters
Failed map tasks=2
Launched map tasks=3
Launched reduce tasks=1
Other local map tasks=2
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=470947
Total time spent by all reduces in occupied slots (ms)=2322365
Total time spent by all map tasks (ms)=470947
Total time spent by all reduce tasks (ms)=2322365
Total vcore-milliseconds taken by all map tasks=470947
Total vcore-milliseconds taken by all reduce tasks=2322365
Total megabyte-milliseconds taken by all map tasks=482249728
Total megabyte-milliseconds taken by all reduce tasks=2378101760
Map-Reduce Framework
Map input records=3337249
Map output records=9030245
Map output bytes=574459507
Map output materialized bytes=510084358
Input split bytes=121
Combine input records=14090733
Combine output records=9826424
Reduce input groups=4765936
Reduce shuffle bytes=510084358
Reduce input records=4765936
Reduce output records=4765936
Spilled Records=14592360
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=72588
CPU time spent (ms)=631840
Physical memory (bytes) snapshot=959533056
Virtual memory (bytes) snapshot=5185523712
Total committed heap usage (bytes)=803209216
Peak Map Physical memory (bytes)=525352960
Peak Map Virtual memory (bytes)=2590576640
Peak Reduce Physical memory (bytes)=434180096
Peak Reduce Virtual memory (bytes)=2606690304
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=189756259
File Output Format Counters
Bytes Written=488209962
hadoop 3.x常用端口
HDFS NameNode内部通信端口:8020/9000/9820
NameNode对外/用户端口:9870
YARN MapReduce查看任务端口:8088
历史 历史服务器通信端口:19888
hadoop 2.x常用端口
HDFS NameNode内部通信端口:8020/9000
NameNode对外/用户端口:50070
YARN MapReduce查看任务端口:8088
历史 历史服务器通信端口:19888
集群 时间同步 内网有待商榷 ntp或chrony有外网无需设置时间同步
[root@hadoop106 linux]# systemctl status ntpd #或chronyd 查看ntp或chrony服务状态
[root@hadoop106 linux]# systemctl start ntpd #或chronyd 开启ntp或chrony服务
[root@hadoop106 linux]# systemctl is-enabled ntpd #或chronyd 开机自启动ntp或chrony服务状态
[root@hadoop106 linux]#vim /etc/ntp.conf #或chrony.conf 更改配置
[linux@hadoop107 ~]$ sudo date -s "2021-9-11 11:11:11" #更改107时间
[linux@hadoop107 ~]$ date #查看时间
常见错误
1)防火墙没关闭、或者没有启动 YARN
INFO client.RMProxy: Connecting to ResourceManager at hadoop108/192.168.10.108:8032
2)主机名称配置错误
3)IP地址配置错误
4)ssh没有配置好
5)root用户和 atguigu两个用户启动集群不统一
6)配置文件修改不细心
7)不识别主机名称
java.net.UnknownHostException: hadoop102: hadoop102
at
java.net.InetAddress.getLocalHost(InetAddress.java:1475)
at
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(Job
Submitter.java:146)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at
Method)
java.security.AccessController.doPrivileged(Native
at javax.security.auth.Subject.doAs(Subject.java:415)
解决办法:
(1)在/etc/hosts文件中添加 192.168.10.102 hadoop102
(2)主机名称不要起 hadoop hadoop000等特殊名称
8)DataNode和 NameNode进程同时只能工作一个。
DataNode和NameNode进程同时只能有一个工作问题分析
1)NameNode在format初始化后会生成clusterId(集群id)NameNode
3)再次格式化NameNode,生成新的clusterid,与未删除DataNode的clusterid
2)DataNode在启动后也会生成和NameNode一样的clusterId(集群id)不一致新NameNode
4)解决办法:在格式化之前,先删除DataNode里面的信息(默认在/data/tmp,如果配置了该目录,那就去你配置的目录下删除数据)
DataNode1
DataNode2
DataNode3
9)执行命令不生效,粘贴 Word中命令时,遇到-和长–没区分开。导致命令失效
解决办法:尽量不要粘贴 Word中代码。
10)jps发现进程已经没有,但是重新启动集群,提示进程已经开启。
原因是在 Linux的根目录下/tmp目录中存在启动的进程临时文件,将集群相关进程删除掉,再重新启动集群。
11)jps不生效
原因:全局变量 hadoop java没有生效。解决办法:需要 source /etc/profile文件。
12)8088端口连接不上
[atguigu@hadoop102桌面]$ cat /etc/hosts
注释掉如下代码
#127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
#::1 hadoop102