Hadoop运行模式
1、Hadoop官网:http://hadoop.apache.org/
2、Hadoop运行模式包括
Hadoop运行模式包括:本地模式、伪分布式模式以及完全分布式模式。
本地模式:单机运行,只是用来演示一下官方案例。生产环境不用。
伪分布式模式:也是单机运行,但是具备Hadoop集群的所有功能,一台服务器模拟一个分布式的环境。个别缺钱的公司用来测试,生产环境不用。
完全分布式模式:多台服务器组成分布式环境。生产环境使用。
3、本地运行模式:
- 在/export/data/文件下上传测试文件
[wangliukun@hadoop01 data]$ sudo rz
rz waiting to receive.**[wangliukun@hadoop01 data]$ ll
总用量 1524
-rw-rw-rw-. 1 wangliukun wangliukun 1558030 3月 17 2024 test.txt
测试文件内容随便写,本次测试用于统计文件内各个单词出现的次数
- 执行测试
回到/export/servers/hadoop-3.3.4目录
[wangliukun@hadoop01 data]$ cd ..
[wangliukun@hadoop01 export]$ cd servers/hadoop-3.3.4/
[wangliukun@hadoop01 hadoop-3.3.4]$ ll
总用量 92
drwxr-xr-x. 2 1024 1024 203 7月 29 2022 bin
drwxr-xr-x. 3 1024 1024 20 7月 29 2022 etc
drwxr-xr-x. 2 1024 1024 106 7月 29 2022 include
drwxr-xr-x. 3 1024 1024 20 7月 29 2022 lib
drwxr-xr-x. 4 1024 1024 288 7月 29 2022 libexec
-rw-rw-r--. 1 1024 1024 24707 7月 29 2022 LICENSE-binary
drwxr-xr-x. 2 1024 1024 4096 7月 29 2022 licenses-binary
-rw-rw-r--. 1 1024 1024 15217 7月 17 2022 LICENSE.txt
-rw-rw-r--. 1 1024 1024 29473 7月 17 2022 NOTICE-binary
-rw-rw-r--. 1 1024 1024 1541 4月 22 2022 NOTICE.txt
-rw-rw-r--. 1 1024 1024 175 4月 22 2022 README.txt
drwxr-xr-x. 3 1024 1024 4096 7月 29 2022 sbin
drwxr-xr-x. 4 1024 1024 31 7月 29 2022 share
[wangliukun@hadoop01 hadoop-3.3.4]$ pwd
/export/servers/hadoop-3.3.4
执行程序
[wangliukun@hadoop01 hadoop-3.3.4]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.4.jar wordcount /export/data/ /export/data/wcoutput
2024-03-14 22:17:55,033 INFO impl.MetricsConfig: Loaded properties from hadoop-metrics2.properties
2024-03-14 22:17:55,146 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2024-03-14 22:17:55,146 INFO impl.MetricsSystemImpl: JobTracker metrics system started
2024-03-14 22:17:55,739 INFO input.FileInputFormat: Total input files to process : 1
2024-03-14 22:17:55,776 INFO mapreduce.JobSubmitter: number of splits:1
2024-03-14 22:17:56,005 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local522766157_0001
2024-03-14 22:17:56,005 INFO mapreduce.JobSubmitter: Executing with tokens: []
2024-03-14 22:17:56,224 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
2024-03-14 22:17:56,225 INFO mapreduce.Job: Running job: job_local522766157_0001
2024-03-14 22:17:56,239 INFO mapred.LocalJobRunner: OutputCommitter set in config null
2024-03-14 22:17:56,253 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2024-03-14 22:17:56,253 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2024-03-14 22:17:56,255 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
2024-03-14 22:17:56,317 INFO mapred.LocalJobRunner: Waiting for map tasks
2024-03-14 22:17:56,318 INFO mapred.LocalJobRunner: Starting task: attempt_local522766157_0001_m_000000_0
2024-03-14 22:17:56,363 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2024-03-14 22:17:56,363 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2024-03-14 22:17:56,397 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2024-03-14 22:17:56,407 INFO mapred.MapTask: Processing split: file:/export/data/test.txt:0+1558030
2024-03-14 22:17:56,481 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
2024-03-14 22:17:56,481 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
2024-03-14 22:17:56,481 INFO mapred.MapTask: soft limit at 83886080
2024-03-14 22:17:56,481 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
2024-03-14 22:17:56,481 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
2024-03-14 22:17:56,494 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2024-03-14 22:17:56,837 INFO mapred.LocalJobRunner:
2024-03-14 22:17:56,838 INFO mapred.MapTask: Starting flush of map output
2024-03-14 22:17:56,838 INFO mapred.MapTask: Spilling map output
2024-03-14 22:17:56,838 INFO mapred.MapTask: bufstart = 0; bufend = 2626710; bufvoid = 104857600
2024-03-14 22:17:56,838 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 25127360(100509440); length = 1087037/6553600
2024-03-14 22:17:57,110 INFO mapred.MapTask: Finished spill 0
2024-03-14 22:17:57,128 INFO mapred.Task: Task:attempt_local522766157_0001_m_000000_0 is done. And is in the process of committing
2024-03-14 22:17:57,131 INFO mapred.LocalJobRunner: map
2024-03-14 22:17:57,131 INFO mapred.Task: Task 'attempt_local522766157_0001_m_000000_0' done.
2024-03-14 22:17:57,137 INFO mapred.Task: Final Counters for attempt_local522766157_0001_m_000000_0: Counters: 18
File System Counters
FILE: Number of bytes read=1839168
FILE: Number of bytes written=924802
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=1
Map output records=271760
Map output bytes=2626710
Map output materialized bytes=53
Input split bytes=91
Combine input records=271760
Combine output records=4
Spilled Records=4
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=23
Total committed heap usage (bytes)=279969792
File Input Format Counters
Bytes Read=1558030
2024-03-14 22:17:57,137 INFO mapred.LocalJobRunner: Finishing task: attempt_local522766157_0001_m_000000_0
2024-03-14 22:17:57,144 INFO mapred.LocalJobRunner: map task executor complete.
2024-03-14 22:17:57,146 INFO mapred.LocalJobRunner: Waiting for reduce tasks
2024-03-14 22:17:57,147 INFO mapred.LocalJobRunner: Starting task: attempt_local522766157_0001_r_000000_0
2024-03-14 22:17:57,161 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2024-03-14 22:17:57,162 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2024-03-14 22:17:57,162 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2024-03-14 22:17:57,167 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@79ef199a
2024-03-14 22:17:57,168 WARN impl.MetricsSystemImpl: JobTracker metrics system already initialized!
2024-03-14 22:17:57,278 INFO mapreduce.Job: Job job_local522766157_0001 running in uber mode : false
2024-03-14 22:17:57,279 INFO mapreduce.Job: map 100% reduce 0%
2024-03-14 22:17:57,285 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=616195648, maxSingleShuffleLimit=154048912, mergeThreshold=406689152, ioSortFactor=10, memToMemMergeOutputsThreshold=10
2024-03-14 22:17:57,293 INFO reduce.EventFetcher: attempt_local522766157_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
2024-03-14 22:17:57,332 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local522766157_0001_m_000000_0 decomp: 49 len: 53 to MEMORY
2024-03-14 22:17:57,337 INFO reduce.InMemoryMapOutput: Read 49 bytes from map-output for attempt_local522766157_0001_m_000000_0
2024-03-14 22:17:57,338 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 49, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->49
2024-03-14 22:17:57,340 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
2024-03-14 22:17:57,343 INFO mapred.LocalJobRunner: 1 / 1 copied.
2024-03-14 22:17:57,343 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
2024-03-14 22:17:57,360 INFO mapred.Merger: Merging 1 sorted segments
2024-03-14 22:17:57,360 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 42 bytes
2024-03-14 22:17:57,362 INFO reduce.MergeManagerImpl: Merged 1 segments, 49 bytes to disk to satisfy reduce memory limit
2024-03-14 22:17:57,363 INFO reduce.MergeManagerImpl: Merging 1 files, 53 bytes from disk
2024-03-14 22:17:57,363 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
2024-03-14 22:17:57,363 INFO mapred.Merger: Merging 1 sorted segments
2024-03-14 22:17:57,365 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 42 bytes
2024-03-14 22:17:57,366 INFO mapred.LocalJobRunner: 1 / 1 copied.
2024-03-14 22:17:57,369 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
2024-03-14 22:17:57,372 INFO mapred.Task: Task:attempt_local522766157_0001_r_000000_0 is done. And is in the process of committing
2024-03-14 22:17:57,373 INFO mapred.LocalJobRunner: 1 / 1 copied.
2024-03-14 22:17:57,373 INFO mapred.Task: Task attempt_local522766157_0001_r_000000_0 is allowed to commit now
2024-03-14 22:17:57,374 INFO output.FileOutputCommitter: Saved output of task 'attempt_local522766157_0001_r_000000_0' to file:/export/data/wcoutput
2024-03-14 22:17:57,387 INFO mapred.LocalJobRunner: reduce > reduce
2024-03-14 22:17:57,387 INFO mapred.Task: Task 'attempt_local522766157_0001_r_000000_0' done.
2024-03-14 22:17:57,388 INFO mapred.Task: Final Counters for attempt_local522766157_0001_r_000000_0: Counters: 24
File System Counters
FILE: Number of bytes read=1839306
FILE: Number of bytes written=924914
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Combine input records=0
Combine output records=0
Reduce input groups=4
Reduce shuffle bytes=53
Reduce input records=4
Reduce output records=4
Spilled Records=4
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=84
Total committed heap usage (bytes)=179306496
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Output Format Counters
Bytes Written=59
2024-03-14 22:17:57,388 INFO mapred.LocalJobRunner: Finishing task: attempt_local522766157_0001_r_000000_0
2024-03-14 22:17:57,388 INFO mapred.LocalJobRunner: reduce task executor complete.
2024-03-14 22:17:58,284 INFO mapreduce.Job: map 100% reduce 100%
2024-03-14 22:17:58,285 INFO mapreduce.Job: Job job_local522766157_0001 completed successfully
2024-03-14 22:17:58,303 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=3678474
FILE: Number of bytes written=1849716
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=1
Map output records=271760
Map output bytes=2626710
Map output materialized bytes=53
Input split bytes=91
Combine input records=271760
Combine output records=4
Reduce input groups=4
Reduce shuffle bytes=53
Reduce input records=4
Reduce output records=4
Spilled Records=8
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=107
Total committed heap usage (bytes)=459276288
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1558030
File Output Format Counters
Bytes Written=59
[wangliukun@hadoop01 hadoop-3.3.4]$ cd /export/data/
[wangliukun@hadoop01 data]$ ll
总用量 1524
-rw-rw-rw-. 1 wangliukun wangliukun 1558030 3月 17 2024 test.txt
drwxr-xr-x. 2 wangliukun wangliukun 88 3月 14 22:17 wcoutput
- 查看结果
[wangliukun@hadoop01 data]$ cd wcoutput/
[wangliukun@hadoop01 wcoutput]$ ll
总用量 4
-rw-r--r--. 1 wangliukun wangliukun 47 3月 14 22:17 part-r-00000
-rw-r--r--. 1 wangliukun wangliukun 0 3月 14 22:17 _SUCCESS
[wangliukun@hadoop01 wcoutput]$ cat part-r-00000
good 72530
hadoop 54170
hdfs 72530
hello 72530
[wangliukun@hadoop01 wcoutput]$
此处我们将/export/servers/hadoop-3.3.4更改为hadoop,方便使用
[wangliukun@hadoop01 servers]$ mv hadoop-3.3.4/ ./hadoop
[wangliukun@hadoop01 servers]$ ll
总用量 0
drwxr-xr-x. 10 wangliukun wangliukun 215 3月 14 22:15 hadoop
drwxr-xr-x. 8 wangliukun wangliukun 255 7月 22 2017 jdk
[wangliukun@hadoop01 servers]$
4、完全分布式
准备工作:
- 准备3台客户机(关闭防火墙、静态IP、主机名称)
- 安装JDK
- 配置环境变量
- 安装Hadoop
- 配置环境变量
- 配置集群
- 单点启动
- 配置ssh
- 群起并测试集群
由于我们是克隆的虚拟机,只需完成部分工作
1、解压安装hadoop,配置环境变量
Hadoop02:
#解压
[wangliukun@hadoop02 software]# tar -zxvf hadoop-3.3.4.tar.gz -C /export/servers/
[wangliukun@hadoop02 software]# cd ../servers/
[wangliukun@hadoop02 servers]# ll
总用量 0
drwxr-xr-x 10 wwangliukun wangliukun 1024 215 7月 29 2022 hadoop-3.3.4
drwxr-xr-x. 8 wangliukun wangliukun 255 7月 22 2017 jdk
[wangliukun@hadoop02 servers]# mv hadoop-3.3.4/ ./hadoop
[wangliukun@hadoop02 servers]# ll
总用量 0
drwxr-xr-x 10 wangliukun wangliukun 1024 215 7月 29 2022 hadoop
drwxr-xr-x. 8 wangliukun wangliukun 255 7月 22 2017 jdk
#配置变量
[wangliukun@hadoop02 servers]$ cd hadoop/
[wangliukun@hadoop02 hadoop]$ pwd
/export/servers/hadoop
[wangliukun@hadoop02 hadoop]$ sudo vim /etc/profile
[sudo] wangliukun 的密码:
#使文件生效
[wangliukun@hadoop02 hadoop]$ source /etc/profile
#hadoop测试
[wangliukun@hadoop02 hadoop]$ hadoop version
Hadoop 3.3.4
Source code repository https://github.com/apache/hadoop.git -r a585a73c3e02ac62350c136643a5e7f6095a3dbb
Compiled by stevel on 2022-07-29T12:32Z
Compiled with protoc 3.7.1
From source with checksum fb9dd8918a7b8a5b430d61af858f6ec
This command was run using /export/servers/hadoop/share/hadoop/common/hadoop-common-3.3.4.jar
[wangliukun@hadoop02 hadoop]$
在以上配置中,/etc/profilr中添加如下:
#HADOOP_HOME
export HADOOP_HOME=/export/servers/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
Hadoop03:
#解压
[wangliukun@hadoop03 software]# tar -zxvf hadoop-3.3.4.tar.gz -C /export/servers/
[wangliukun@hadoop03 software]# cd ../servers/
[wangliukun@hadoop03 servers]# ll
总用量 0
drwxr-xr-x 10 wangliukun wangliukun 1024 215 7月 29 2022 hadoop-3.3.4
drwxr-xr-x. 8 wangliukun wangliukun 255 7月 22 2017 jdk
[wangliukun@hadoop03 servers]# mv hadoop-3.3.4/ ./hadoop
[wangliukun@hadoop03 servers]# ll
总用量 0
drwxr-xr-x 10 wangliukun wangliukun 1024 215 7月 29 2022 hadoop
drwxr-xr-x. 8 wangliukun wangliukun 255 7月 22 2017 jdk
#配置变量
[wangliukun@hadoop03 hadoop]$ pwd
/export/servers/hadoop
[wangliukun@hadoop03 hadoop]$ sudo vim /etc/profile
[sudo] wangliukun 的密码:
#使文件生效
[wangliukun@hadoop03 hadoop]$ source /etc/profile
#hadoop测试
[wangliukun@hadoop03 hadoop]$ hadoop version
Hadoop 3.3.4
Source code repository https://github.com/apache/hadoop.git -r a585a73c3e02ac62350c136643a5e7f6095a3dbb
Compiled by stevel on 2022-07-29T12:32Z
Compiled with protoc 3.7.1
From source with checksum fb9dd8918a7b8a5b430d61af858f6ec
This command was run using /export/servers/hadoop/share/hadoop/common/hadoop-common-3.3.4.jar
[wangliukun@hadoop03 hadoop]$ ^C
在以上配置中,/etc/profilr中添加如下:
#HADOOP_HOME
export HADOOP_HOME=/export/servers/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
2、xsync集群分发脚本
-
需求:循环复制文件到所有节点的相同目录下
-
需求分析:
rsync命令原始拷贝:
rsync -av /opt/module atguigu@hadoop103:/opt/
期望脚本:xsync要同步的文件名称
期望脚本在任何路径都能使用(脚本放在声明了全局环境变量的路径)
[wangliukun@hadoop01 /]$ echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/export/servers/jdk/bin:/root/bin:/export/servers/jdk/bin:/export/servers/hadoop-3.3.4/bin:/export/servers/hadoop-3.3.4/sbin
[wangliukun@hadoop01 /]$
- 脚本实现
在/home/wangliukun/bin目录下创建xsync文件
[wangliukun@hadoop01 bin]$ cd /home/wangliukun/
[wangliukun@hadoop01 ~]$ mkdir bin
[wangliukun@hadoop01 ~]$ cd bin
[wangliukun@hadoop01 bin]$ vim xsync
脚本内容如下:
[wangliukun@hadoop01 bin]$ cat xsync
#!/bin/bash
#1. 判断参数个数
if [ $# -lt 1 ]
then
echo Not Enough Arguement!
exit;
fi
#2. 遍历集群所有机器
for host in hadoop01 hadoop02 hadoop03
do
echo ==================== $host ====================
#3. 遍历所有目录,挨个发送
for file in $@
do
#4. 判断文件是否存在
if [ -e $file ]
then
#5. 获取父目录
pdir=$(cd -P $(dirname $file); pwd)
#6. 获取当前文件的名称
fname=$(basename $file)
ssh $host "mkdir -p $pdir"
rsync -av $pdir/$fname $host:$pdir
else
echo $file does not exists!
fi
done
done
[wangliukun@hadoop01 bin]$
修改脚本 xsync 具有执行权限
[wangliukun@hadoop01 bin]$ chmod +x xsync
测试脚本
[wangliukun@hadoop01 ~]$ xsync /home/wangliukun/bin
==================== hadoop01 ====================
wangliukun@hadoop01's password:
wangliukun@hadoop01's password:
sending incremental file list
sent 179 bytes received 19 bytes 79.20 bytes/sec
total size is 736 speedup is 3.72
==================== hadoop02 ====================
wangliukun@hadoop02's password:
wangliukun@hadoop02's password:
sending incremental file list
bin/
bin/w/
sent 189 bytes received 25 bytes 85.60 bytes/sec
total size is 736 speedup is 3.44
==================== hadoop03 ====================
wangliukun@hadoop03's password:
wangliukun@hadoop03's password:
sending incremental file list
bin/
bin/w/
sent 185 bytes received 25 bytes 140.00 bytes/sec
total size is 736 speedup is 3.50
[wangliukun@hadoop01 ~]$
将脚本复制到/bin中,以便全局调用
[wangliukun@hadoop01 bin]$ sudo cp xsync /bin/
[sudo] wangliukun 的密码:
同步环境变量配置(root所有者)
[wangliukun@hadoop01 ~]$ sudo /bin/xsync /etc/profile.d/
==================== hadoop01 ====================
root@hadoop01's password:
root@hadoop01's password:
sending incremental file list
sent 288 bytes received 17 bytes 122.00 bytes/sec
total size is 10,877 speedup is 35.66
==================== hadoop02 ====================
root@hadoop02's password:
root@hadoop02's password:
sending incremental file list
sent 284 bytes received 17 bytes 200.67 bytes/sec
total size is 10,877 speedup is 36.14
==================== hadoop03 ====================
root@hadoop03's password:
root@hadoop03's password:
sending incremental file list
sent 284 bytes received 17 bytes 120.40 bytes/sec
total size is 10,877 speedup is 36.14
让环境变量生效
[wangliukun@hadoop01 ~]$ source /etc/profile
[wangliukun@hadoop02 bin]$ source /etc/profile
[wangliukun@hadoop03 bin]$ source /etc/profile
3、ssh免密登录
配置ssh
[wangliukun@hadoop01 ~]$ ssh hadoop02
wangliukun@hadoop02's password:
Last login: Thu Mar 14 22:47:33 2024
[wangliukun@hadoop02 ~]$ exit
登出
Connection to hadoop02 closed.
[wangliukun@hadoop01 ~]$
无密钥配置(在三台虚拟机上分别进行)
#生成公钥和私钥
[wangliukun@hadoop01 ~]$ cd .ssh
[wangliukun@hadoop01 .ssh]$ ll
总用量 4
-rw-r--r--. 1 wangliukun wangliukun 555 3月 14 23:08 known_hosts
[wangliukun@hadoop01 .ssh]$ pwd
/home/wangliukun/.ssh
[wangliukun@hadoop01 .ssh]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/wangliukun/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/wangliukun/.ssh/id_rsa.
Your public key has been saved in /home/wangliukun/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:7x6u66p0lO0Aegq8Lx4nJwdwvwXGTuqdetE4LPaH7DI wangliukun@hadoop01
The key's randomart image is:
+---[RSA 2048]----+
| |
| . |
|. . * |
|o. B o o |
|.oo.+o= S |
| o=+==.o . |
| *oB=+. . o |
| .Eo=.. o . |
|..o*oo.o+++ |
+----[SHA256]-----+
[wangliukun@hadoop01 .ssh]$ ll
总用量 12
-rw-------. 1 wangliukun wangliukun 1679 3月 14 23:32 id_rsa#私钥
-rw-r--r--. 1 wangliukun wangliukun 401 3月 14 23:32 id_rsa.pub#公钥
-rw-r--r--. 1 wangliukun wangliukun 555 3月 14 23:08 known_hosts
[wangliukun@hadoop01 .ssh]$
#将公钥拷贝到要免密登录的目标机器上
[wangliukun@hadoop01 .ssh]$ ssh-copy-id hadoop01
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/wangliukun/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
wangliukun@hadoop01's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'hadoop01'"
and check to make sure that only the key(s) you wanted were added.
[wangliukun@hadoop01 .ssh]$ ssh-copy-id hadoop02
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/wangliukun/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
wangliukun@hadoop02's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'hadoop02'"
and check to make sure that only the key(s) you wanted were added.
[wangliukun@hadoop01 .ssh]$ ssh-copy-id hadoop03
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/wangliukun/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
wangliukun@hadoop03's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'hadoop03'"
and check to make sure that only the key(s) you wanted were added.
#测试
[wangliukun@hadoop01 .ssh]$ cd /
[wangliukun@hadoop01 /]$ ssh hadoop02
Last login: Fri Mar 15 00:58:05 2024 from hadoop01
[wangliukun@hadoop02 ~]$ exit
登出
Connection to hadoop02 closed.
[wangliukun@hadoop01 /]$
known_hosts | 记录ssh访问过计算机的公钥(public key) |
---|---|
id_rsa | 生成的私钥 |
id_rsa.pub | 生成的公钥 |
authorized_keys | 存放授权过的无密登录服务器公钥 |
以上共传递9个公钥
4、集群配置
1>集群部署规划:
NameNode和SecondaryNameNode不要安装在同一台服务器
ResourceManager也很消耗内存,不要和NameNode、SecondaryNameNode配置在同一台机器上。
hadoop01 | hadoop02 | hadoop03 | |
---|---|---|---|
HDFS | NameNode、DataNode | DataNode | SecondaryNameNode、DataNode |
YARN | NodeManager | ResourceManager、NodeManager | NodeManager |
2>配置文件说明
Hadoop配置文件分两类:默认配置文件和自定义配置文件,只有用户想修改某一默认配置值时,才需要修改自定义配置文件,更改相应属性值。
<1>默认配置文件
要获取的默认文件 | 文件存放在Hadoop的jar包中的位置 |
---|---|
[core-default.xml] | hadoop-common-3.1.3.jar/core-default.xml |
[hdfs-default.xml] | hadoop-hdfs-3.1.3.jar/hdfs-default.xml |
[yarn-default.xml] | hadoop-yarn-common-3.1.3.jar/yarn-default.xml |
[mapred-default.xml] | hadoop-mapreduce-client-core-3.1.3.jar/mapred-default.xml |
<2>自定义配置文件
core-site.xml、hdfs-site.xml、yarn-site.xml、mapred-site.xml四个配置文件存放在$HADOOP_HOME/etc/hadoop这个路径上,用户可以根据项目需求重新进行修改配置。
3>配置集群
- 核心配置文件
配置core-site.xml
[wangliukun@hadoop01 hadoop]$ vim core-site.xml
文件内容:
[wangliukun@hadoop01 hadoop]$ cat core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- 指定NameNode的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop01:8020</value>
</property>
<!-- 指定hadoop数据的存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/export/servers/hadoop/data</value>
</property>
</configuration>
- HDFS配置文件
配置hdfs-site.xml
[wangliukun@hadoop01 hadoop]$ vim hdfs-site.xml
文件内容:
[wangliukun@hadoop01 hadoop]$ cat hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- nn web端访问地址-->
<property>
<name>dfs.namenode.http-address</name>
<value>hadoop01:9870</value>
</property>
<!-- 2nn web端访问地址-->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop03:9868</value>
</property>
</configuration>
- YARN配置文件
配置yarn-site.xml
[wangliukun@hadoop01 hadoop]$ vim yarn-site.xml
文件内容:
[wangliukun@hadoop01 hadoop]$ cat yarn-site.xml
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<!-- 指定MR走shuffle -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 指定ResourceManager的地址-->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop02</value>
</property>
<!-- 环境变量的继承 -->
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
</configuration>
- MapReduce配置文件
[wangliukun@hadoop01 hadoop]$ vim mapred-site.xml
文件内容:
[wangliukun@hadoop01 hadoop]$ cat mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- 指定MapReduce程序运行在Yarn上 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
4>在集群上分发配置好的Hadoop配置文件
[wangliukun@hadoop01 hadoop]$ xsync hadoop/
5>查询Hadoop02/Hadoop03上分发文件配置情况
[root@hadoop02 hadoop]# cat core-site.xml
[root@hadoop03 hadoop]# cat core-site.xml
5、群起集群
1、配置workers
[wangliukun@hadoop01 hadoop]$ vim workers
[wangliukun@hadoop01 hadoop]$ cat workers
hadoop01
hadoop02
hadoop03
同步所有节点
[wangliukun@hadoop01 hadoop]$ xsync workers
==================== hadoop01 ====================
sending incremental file list
sent 69 bytes received 12 bytes 162.00 bytes/sec
total size is 27 speedup is 0.33
==================== hadoop02 ====================
sending incremental file list
workers
sent 143 bytes received 41 bytes 368.00 bytes/sec
total size is 27 speedup is 0.15
==================== hadoop03 ====================
sending incremental file list
workers
sent 143 bytes received 41 bytes 368.00 bytes/sec
total size is 27 speedup is 0.15
2、启动集群
- 如果集群是第一次启动,需要在hadoop01节点格式化NameNode
- 注意:格式化NameNode,会产生新的集群id,导致NameNode和DataNode的集群id不一致,集群找不到已往数据。如果集群在运行过程中报错,需要重新格式化NameNode的话,一定要先停止namenode和datanode进程,并且要删除所有机器的data和logs目录,然后再进行格式化。
[wangliukun@hadoop01 hadoop]$ hdfs namenode -format
当前服务器版本号:
[wangliukun@hadoop01 current]$ ll
总用量 16
-rw-rw-r--. 1 wangliukun wangliukun 405 3月 18 17:28 fsimage_0000000000000000000
-rw-rw-r--. 1 wangliukun wangliukun 62 3月 18 17:28 fsimage_0000000000000000000.md5
-rw-rw-r--. 1 wangliukun wangliukun 2 3月 18 17:28 seen_txid
-rw-rw-r--. 1 wangliukun wangliukun 218 3月 18 17:28 VERSION
- 启动HDFS
[wangliukun@hadoop01 hadoop]$ sbin/start-dfs.sh
Starting namenodes on [hadoop01]
hadoop01: ERROR: JAVA_HOME is not set and could not be found.
Starting datanodes
hadoop03: ERROR: JAVA_HOME is not set and could not be found.
hadoop01: ERROR: JAVA_HOME is not set and could not be found.
hadoop02: ERROR: JAVA_HOME is not set and could not be found.
Starting secondary namenodes [hadoop03]
hadoop03: ERROR: JAVA_HOME is not set and could not be found.
报错:
ERROR: JAVA_HOME is not set and could not be found.
原因:
在JDK配置无错的情况下,可能是没有配置hadoop-env.sh文件。这个文件里写的是hadoop的环境变量,主要修改hadoop的JAVA_HOME路径。
解决办法:
- 切到 [hadoop]/etc/hadoop目录
- 执行:vim hadoop-env.sh
- 修改java_home路径和hadoop_conf_dir路径为具体的安装路径
export JAVA_HOME=/export/servers/jdk
export HADOOP_CONF_DIR=/export/servers/hadoop/etc/hadoop
执行开启HDFS:
[wangliukun@hadoop01 hadoop]$ sbin/start-dfs.sh
Starting namenodes on [hadoop01]
hadoop01: namenode is running as process 2808. Stop it first and ensure /tmp/hadoop-wangliukun-namenode.pid file is empty before retry.
Starting datanodes
hadoop02: WARNING: /export/servers/hadoop/logs does not exist. Creating.
hadoop03: WARNING: /export/servers/hadoop/logs does not exist. Creating.
hadoop01: datanode is running as process 2918. Stop it first and ensure /tmp/hadoop-wangliukun-datanode.pid file is empty before retry.
Starting secondary namenodes [hadoop03]
继续报错
hadoop02: WARNING: /export/servers/hadoop/logs does not exist. Creating.
hadoop03: WARNING: /export/servers/hadoop/logs does not exist. Creating.
原因:
只有hadoop01完成了格式化,Hadoop02/Hadoop03未进行格式化
解决办法:
格式Hadoop02/Hadoop03
[root@hadoop02 hadoop]# hdfs namenode -format
[root@hadoop03 hadoop]# hdfs namenode -format
再次开启HDFS
[wangliukun@hadoop01 hadoop]$ sbin/start-dfs.sh
Starting namenodes on [hadoop01]
hadoop01: namenode is running as process 2808. Stop it first and ensure /tmp/hadoop-wangliukun-namenode.pid file is empty before retry.
Starting datanodes
hadoop02: datanode is running as process 1906. Stop it first and ensure /tmp/hadoop-wangliukun-datanode.pid file is empty before retry.
hadoop03: datanode is running as process 1550. Stop it first and ensure /tmp/hadoop-wangliukun-datanode.pid file is empty before retry.
hadoop01: datanode is running as process 2918. Stop it first and ensure /tmp/hadoop-wangliukun-datanode.pid file is empty before retry.
Starting secondary namenodes [hadoop03]
hadoop03: secondarynamenode is running as process 1604. Stop it first and ensure /tmp/hadoop-wangliukun-secondarynamenode.pid file is empty before retry.
[wangliukun@hadoop01 hadoop]$
开启成功
查看:
[wangliukun@hadoop01 hadoop]$ jps
2918 DataNode
2808 NameNode
4024 Jps
[root@hadoop02 hadoop]# jps
1906 DataNode
2227 Jps
[root@hadoop03 hadoop]# jps
2097 Jps
1604 SecondaryNameNode
1550 DataNode
- Web端查看HDFS的NameNode
浏览器中输入:http://hadoop102:9870
出现问题:
无法访问
原因:
在本物理机上ping hadoop01 若ping不通
本机无法识别hadoop01
解决办法:
修改本机hosts,添加IP+主机名
填加如下信息:
192.168.10.140 hadoop01
192.168.10.141 hadoop02
192.168.10.142 hadoop03
主机pinghadoop01:已通
浏览器中输入:Browsing HDFS
查看HDFS上存储的数据信息
- 在配置了ResourceManager的节点(hadoop02)启动YARN
[root@hadoop02 hadoop]# sbin/start-yarn.sh
Starting resourcemanager
ERROR: Attempting to operate on yarn resourcemanager as root
ERROR: but there is no YARN_RESOURCEMANAGER_USER defined. Aborting operation.
Starting nodemanagers
ERROR: Attempting to operate on yarn nodemanager as root
ERROR: but there is no YARN_NODEMANAGER_USER defined. Aborting operation.
报错:
ERROR: Attempting to operate on yarn nodemanager as root
原因:root无法启动,请使用创建yarn的用户
解决办法:进入普通用户
[root@hadoop02 hadoop]# su wangliukun
[wangliukun@hadoop02 hadoop]$ sbin/start-yarn.sh
Starting resourcemanager
Starting nodemanagers
启动成功
[wangliukun@hadoop01 hadoop]$ jps
4116 NodeManager
2918 DataNode
4214 Jps
2808 NameNode
[wangliukun@hadoop02 hadoop]$ jps
1906 DataNode
2740 Jps
2598 NodeManager
2490 ResourceManager
[wangliukun@hadoop02 hadoop]$ jps
1604 SecondaryNameNode
2344 Jps
2235 NodeManager
1550 DataNode
[wangliukun@hadoop02 hadoop]$
- Web端查看YARN的ResourceManager
浏览器中输入:All Applications
查看YARN上运行的Job信息
3、集群基本测试
上传文件到集群
- 小文件
[wangliukun@hadoop01 hadoop]$ hadoop fs -mkdir /input
[wangliukun@hadoop01 hadoop]$
#传输文件
[wangliukun@hadoop01 hadoop]$ hadoop fs -put /export/data/wcinput/test.txt /input
- 大文件
[wangliukun@hadoop01 hadoop]$ hadoop fs -put /export/software/jdk-8u144-linux-x64.tar.gz /
[wangliukun@hadoop01 hadoop]$
查看文件位置
- 查看HDFS文件存储路径
[wangliukun@hadoop01 hadoop]$ ll
总用量 96
drwxr-xr-x. 2 wangliukun wangliukun 203 7月 29 2022 bin
drwxrwxr-x. 4 wangliukun wangliukun 37 3月 18 18:50 data
drwxr-xr-x. 3 wangliukun wangliukun 20 7月 29 2022 etc
drwxr-xr-x. 2 wangliukun wangliukun 106 7月 29 2022 include
drwxr-xr-x. 3 wangliukun wangliukun 20 7月 29 2022 lib
drwxr-xr-x. 4 wangliukun wangliukun 288 7月 29 2022 libexec
-rw-rw-r--. 1 wangliukun wangliukun 24707 7月 29 2022 LICENSE-binary
drwxr-xr-x. 2 wangliukun wangliukun 4096 7月 29 2022 licenses-binary
-rw-rw-r--. 1 wangliukun wangliukun 15217 7月 17 2022 LICENSE.txt
drwxrwxr-x. 3 wangliukun wangliukun 4096 3月 25 16:35 logs
-rw-rw-r--. 1 wangliukun wangliukun 29473 7月 17 2022 NOTICE-binary
-rw-rw-r--. 1 wangliukun wangliukun 1541 4月 22 2022 NOTICE.txt
-rw-rw-r--. 1 wangliukun wangliukun 175 4月 22 2022 README.txt
drwxr-xr-x. 3 wangliukun wangliukun 4096 7月 29 2022 sbin
drwxr-xr-x. 4 wangliukun wangliukun 31 7月 29 2022 share
[wangliukun@hadoop01 hadoop]$ cd data/
[wangliukun@hadoop01 data]$ ll
总用量 0
drwxrwxr-x. 4 wangliukun wangliukun 30 3月 18 17:43 dfs
drwxr-xr-x. 5 wangliukun wangliukun 57 3月 25 16:35 nm-local-dir
[wangliukun@hadoop01 data]$ cd dfs/
[wangliukun@hadoop01 dfs]$ ll
总用量 0
drwx------. 3 wangliukun wangliukun 40 3月 25 16:35 data
drwxrwxr-x. 3 wangliukun wangliukun 40 3月 25 16:35 name
[wangliukun@hadoop01 dfs]$ cd data/
[wangliukun@hadoop01 data]$ ll
总用量 4
drwxrwxr-x. 3 wangliukun wangliukun 70 3月 18 17:44 current
-rw-rw-r--. 1 wangliukun wangliukun 13 3月 25 16:35 in_use.lock
[wangliukun@hadoop01 data]$ cd current/
[wangliukun@hadoop01 current]$ ll
总用量 4
drwx------. 4 wangliukun wangliukun 54 3月 25 16:35 BP-380961614-192.168.10.140-1710754117775
-rw-rw-r--. 1 wangliukun wangliukun 229 3月 25 16:35 VERSION
[wangliukun@hadoop01 current]$ cd BP-380961614-192.168.10.140-1710754117775/
[wangliukun@hadoop01 BP-380961614-192.168.10.140-1710754117775]$ ll
总用量 4
drwxrwxr-x. 4 wangliukun wangliukun 64 3月 21 19:32 current
-rw-rw-r--. 1 wangliukun wangliukun 166 3月 18 17:44 scanner.cursor
drwxrwxr-x. 2 wangliukun wangliukun 6 3月 25 16:35 tmp
[wangliukun@hadoop01 BP-380961614-192.168.10.140-1710754117775]$ cd current/
[wangliukun@hadoop01 current]$ ll
总用量 8
-rw-rw-r--. 1 wangliukun wangliukun 18 3月 21 19:32 dfsUsed
drwxrwxr-x. 3 wangliukun wangliukun 21 3月 25 16:39 finalized
drwxrwxr-x. 2 wangliukun wangliukun 6 3月 25 16:45 rbw
-rw-rw-r--. 1 wangliukun wangliukun 145 3月 25 16:35 VERSION
[wangliukun@hadoop01 current]$ cd finalized/
[wangliukun@hadoop01 finalized]$ ll
总用量 0
drwxrwxr-x. 3 wangliukun wangliukun 21 3月 25 16:39 subdir0
[wangliukun@hadoop01 finalized]$ cd subdir0/
[wangliukun@hadoop01 subdir0]$ ll
总用量 0
drwxrwxr-x. 2 wangliukun wangliukun 168 3月 25 16:45 subdir0
[wangliukun@hadoop01 subdir0]$ cd subdir0/
[wangliukun@hadoop01 subdir0]$ ll
总用量 184124
-rw-rw-r--. 1 wangliukun wangliukun 1558030 3月 25 16:39 blk_1073741825
-rw-rw-r--. 1 wangliukun wangliukun 12183 3月 25 16:39 blk_1073741825_1001.meta
-rw-rw-r--. 1 wangliukun wangliukun 134217728 3月 25 16:45 blk_1073741826
-rw-rw-r--. 1 wangliukun wangliukun 1048583 3月 25 16:45 blk_1073741826_1002.meta
-rw-rw-r--. 1 wangliukun wangliukun 51298114 3月 25 16:45 blk_1073741827
-rw-rw-r--. 1 wangliukun wangliukun 400775 3月 25 16:45 blk_1073741827_1003.meta
#路径
/export/servers/hadoop/data/dfs/data/current/BP-380961614-192.168.10.140-1710754117775/current/finalized/subdir0/subdir0
- 查看HDFS在磁盘存储文件内容
[wangliukun@hadoop01 subdir0]$ cat blk_1073741825
拼接
[wangliukun@hadoop01 subdir0]$ cat blk_1073741826 >> tmp.tar.gz
[wangliukun@hadoop01 subdir0]$ cat blk_1073741827 >> tmp.tar.gz
[wangliukun@hadoop01 subdir0]$ ll
总用量 577340
-rw-rw-r--. 1 wangliukun wangliukun 1558030 3月 25 16:39 blk_1073741825
-rw-rw-r--. 1 wangliukun wangliukun 12183 3月 25 16:39 blk_1073741825_1001.meta
-rw-rw-r--. 1 wangliukun wangliukun 134217728 3月 25 16:45 blk_1073741826
-rw-rw-r--. 1 wangliukun wangliukun 1048583 3月 25 16:45 blk_1073741826_1002.meta
-rw-rw-r--. 1 wangliukun wangliukun 51298114 3月 25 16:45 blk_1073741827
-rw-rw-r--. 1 wangliukun wangliukun 400775 3月 25 16:45 blk_1073741827_1003.meta
-rw-rw-r--. 1 wangliukun wangliukun 185515842 3月 25 17:01 tmp.tar.gz
[wangliukun@hadoop01 subdir0]$ tar -zxvf tmp.tar.gz
解压完发现拼接的文件为传输的jdk
[wangliukun@hadoop01 subdir0]$ ll
总用量 577340
-rw-rw-r--. 1 wangliukun wangliukun 1558030 3月 25 16:39 blk_1073741825
-rw-rw-r--. 1 wangliukun wangliukun 12183 3月 25 16:39 blk_1073741825_1001.meta
-rw-rw-r--. 1 wangliukun wangliukun 134217728 3月 25 16:45 blk_1073741826
-rw-rw-r--. 1 wangliukun wangliukun 1048583 3月 25 16:45 blk_1073741826_1002.meta
-rw-rw-r--. 1 wangliukun wangliukun 51298114 3月 25 16:45 blk_1073741827
-rw-rw-r--. 1 wangliukun wangliukun 400775 3月 25 16:45 blk_1073741827_1003.meta
drwxr-xr-x. 8 wangliukun wangliukun 255 7月 22 2017 jdk1.8.0_144
-rw-rw-r--. 1 wangliukun wangliukun 185515842 3月 25 17:01 tmp.tar.gz
[wangliukun@hadoop01 subdir0]$
由网页知,jdk共存了三份
查询可知分别在三台虚拟机节点上
[wangliukun@hadoop01 subdir0]$ ll
总用量 365292
-rw-rw-r--. 1 wangliukun wangliukun 1558030 3月 25 16:39 blk_1073741825
-rw-rw-r--. 1 wangliukun wangliukun 12183 3月 25 16:39 blk_1073741825_1001.meta
-rw-rw-r--. 1 wangliukun wangliukun 134217728 3月 25 16:45 blk_1073741826
-rw-rw-r--. 1 wangliukun wangliukun 1048583 3月 25 16:45 blk_1073741826_1002.meta
-rw-rw-r--. 1 wangliukun wangliukun 51298114 3月 25 16:45 blk_1073741827
-rw-rw-r--. 1 wangliukun wangliukun 400775 3月 25 16:45 blk_1073741827_1003.meta
drwxr-xr-x. 8 wangliukun wangliukun 255 7月 22 2017 jdk1.8.0_144
-rw-rw-r--. 1 wangliukun wangliukun 185515842 3月 25 17:01 tmp.tar.gz
[wangliukun@hadoop01 subdir0]$
[wangliukun@hadoop02 hadoop]$ cd data/dfs/data/current/BP-380961614-192.168.10.140-1710754117775/current/finalized/subdir0/subdir0/
[wangliukun@hadoop02 subdir0]$ ll
总用量 184124
-rw-rw-r-- 1 wangliukun wangliukun 1558030 3月 25 16:39 blk_1073741825
-rw-rw-r-- 1 wangliukun wangliukun 12183 3月 25 16:39 blk_1073741825_1001.meta
-rw-rw-r-- 1 wangliukun wangliukun 134217728 3月 25 16:45 blk_1073741826
-rw-rw-r-- 1 wangliukun wangliukun 1048583 3月 25 16:45 blk_1073741826_1002.meta
-rw-rw-r-- 1 wangliukun wangliukun 51298114 3月 25 16:45 blk_1073741827
-rw-rw-r-- 1 wangliukun wangliukun 400775 3月 25 16:45 blk_1073741827_1003.meta
[wangliukun@hadoop02 subdir0]$
[wangliukun@hadoop03 hadoop]$ cd data/dfs/data/current/BP-380961614-192.168.10.140-1710754117775/current/finalized/subdir0/subdir0/
[wangliukun@hadoop03 subdir0]$ ll
总用量 184124
-rw-rw-r-- 1 wangliukun wangliukun 1558030 3月 25 16:39 blk_1073741825
-rw-rw-r-- 1 wangliukun wangliukun 12183 3月 25 16:39 blk_1073741825_1001.meta
-rw-rw-r-- 1 wangliukun wangliukun 134217728 3月 25 16:45 blk_1073741826
-rw-rw-r-- 1 wangliukun wangliukun 1048583 3月 25 16:45 blk_1073741826_1002.meta
-rw-rw-r-- 1 wangliukun wangliukun 51298114 3月 25 16:45 blk_1073741827
-rw-rw-r-- 1 wangliukun wangliukun 400775 3月 25 16:45 blk_1073741827_1003.meta
[wangliukun@hadoop03 subdir0]$
下载
[wangliukun@hadoop01 software]$ ll
总用量 860328
-rw-rw-rw-. 1 wangliukun wangliukun 695457782 2月 20 19:17 hadoop-3.3.4.tar.gz
-rw-rw-rw-. 1 wangliukun wangliukun 185515842 3月 4 16:40 jdk-8u144-linux-x64.tar.gz
[wangliukun@hadoop01 software]$ hadoop fs -get /jdk-8u144-linux-x64.tar.gz
get: `jdk-8u144-linux-x64.tar.gz': File exists
执行wordcount文件
[wangliukun@hadoop01 hadoop]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.4.jar wordcount /input/wcinput/test.txt /input/wcoutput/
2024-03-25 17:47:54,072 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at hadoop02/192.168.10.141:8032
2024-03-25 17:47:57,870 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/wangliukun/.staging/job_1711355765914_0003
2024-03-25 17:48:01,086 INFO input.FileInputFormat: Total input files to process : 1
2024-03-25 17:48:02,460 INFO mapreduce.JobSubmitter: number of splits:1
2024-03-25 17:48:05,319 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1711355765914_0003
2024-03-25 17:48:05,320 INFO mapreduce.JobSubmitter: Executing with tokens: []
2024-03-25 17:48:08,773 INFO conf.Configuration: resource-types.xml not found
2024-03-25 17:48:08,774 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2024-03-25 17:48:10,653 INFO impl.YarnClientImpl: Submitted application application_1711355765914_0003
2024-03-25 17:48:10,830 INFO mapreduce.Job: The url to track the job: http://hadoop02:8088/proxy/application_1711355765914_0003/
2024-03-25 17:48:10,831 INFO mapreduce.Job: Running job: job_1711355765914_0003
2024-03-25 17:50:04,301 INFO mapreduce.Job: Job job_1711355765914_0003 running in uber mode : false
2024-03-25 17:50:04,405 INFO mapreduce.Job: map 0% reduce 0%
2024-03-25 17:50:39,577 INFO mapreduce.Job: map 100% reduce 0%
2024-03-25 17:50:48,735 INFO mapreduce.Job: map 100% reduce 100%
2024-03-25 17:50:49,826 INFO mapreduce.Job: Job job_1711355765914_0003 completed successfully
2024-03-25 17:50:50,023 INFO mapreduce.Job: Counters: 54
File System Counters
FILE: Number of bytes read=53
FILE: Number of bytes written=551097
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1558138
HDFS: Number of bytes written=47
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
HDFS: Number of bytes read erasure-coded=0
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=29841
Total time spent by all reduces in occupied slots (ms)=7096
Total time spent by all map tasks (ms)=29841
Total time spent by all reduce tasks (ms)=7096
Total vcore-milliseconds taken by all map tasks=29841
Total vcore-milliseconds taken by all reduce tasks=7096
Total megabyte-milliseconds taken by all map tasks=30557184
Total megabyte-milliseconds taken by all reduce tasks=7266304
Map-Reduce Framework
Map input records=1
Map output records=271760
Map output bytes=2626710
Map output materialized bytes=53
Input split bytes=108
Combine input records=271760
Combine output records=4
Reduce input groups=4
Reduce shuffle bytes=53
Reduce input records=4
Reduce output records=4
Spilled Records=8
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=4824
CPU time spent (ms)=6700
Physical memory (bytes) snapshot=537280512
Virtual memory (bytes) snapshot=5144932352
Total committed heap usage (bytes)=370671616
Peak Map Physical memory (bytes)=337448960
Peak Map Virtual memory (bytes)=2569183232
Peak Reduce Physical memory (bytes)=199831552
Peak Reduce Virtual memory (bytes)=2575749120
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1558030
File Output Format Counters
Bytes Written=47
[wangliukun@hadoop01 hadoop]$
过程中会yarn生成一个历史任务
6、配置历史服务器
1>配置mapred-site.xml
[wangliukun@hadoop01 hadoop]$ vim mapred-site.xml
配置文件如下:
<!-- 历史服务器端地址 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop01:10020</value>
</property>
<!-- 历史服务器web端地址 -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop01:19888</value>
</property>
2>分发
[wangliukun@hadoop01 hadoop]$ xsync mapred-site.xml
==================== hadoop01 ====================
sending incremental file list
sent 78 bytes received 12 bytes 60.00 bytes/sec
total size is 1,194 speedup is 13.27
==================== hadoop02 ====================
sending incremental file list
mapred-site.xml
sent 623 bytes received 47 bytes 446.67 bytes/sec
total size is 1,194 speedup is 1.78
==================== hadoop03 ====================
sending incremental file list
mapred-site.xml
sent 623 bytes received 47 bytes 1,340.00 bytes/sec
total size is 1,194 speedup is 1.78
[wangliukun@hadoop01 hadoop]$
3>在hadoop01启动历史服务器
[wangliukun@hadoop01 hadoop]$ mapred --daemon start historyserver
4>查看是否启动
[wangliukun@hadoop01 hadoop]$ jps
4291 JobHistoryServer
1944 DataNode
1836 NameNode
4349 Jps
2270 NodeManager
[wangliukun@hadoop01 hadoop]$
5>查看JobHistory
浏览器输入:http://hadoop01:19888/
7、配置日志的聚集
日志聚集概念:应用运行完成以后,将程序运行日志信息上传到HDFS系统上。
日志聚集功能好处:可以方便的查看到程序运行详情,方便开发调试。
注意:开启日志聚集功能,需要重新启动NodeManager 、ResourceManager和HistoryServer。
开启日志聚集功能具体步骤如下:
1>配置yarn-site.xml
[wangliukun@hadoop01 hadoop]$ vim yarn-site.xml
配置文件添加:
<!-- 开启日志聚集功能 -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!-- 设置日志聚集服务器地址 -->
<property>
<name>yarn.log.server.url</name>
<value>http://hadoop01:19888/jobhistory/logs</value>
</property>
<!-- 设置日志保留时间为7天 -->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
2、分发:
[wangliukun@hadoop01 hadoop]$ xsync yarn-site.xml
==================== hadoop01 ====================
sending incremental file list
sent 76 bytes received 12 bytes 58.67 bytes/sec
total size is 1,647 speedup is 18.72
==================== hadoop02 ====================
sending incremental file list
yarn-site.xml
sent 1,074 bytes received 47 bytes 2,242.00 bytes/sec
total size is 1,647 speedup is 1.47
==================== hadoop03 ====================
sending incremental file list
yarn-site.xml
sent 1,074 bytes received 47 bytes 747.33 bytes/sec
total size is 1,647 speedup is 1.47
[wangliukun@hadoop01 hadoop]$
3、关闭NodeManager 、ResourceManager和HistoryServer
[wangliukun@hadoop02 hadoop]$ sbin/stop-yarn.sh
Stopping nodemanagers
Stopping resourcemanager
4、启动NodeManager 、ResourceManage和HistoryServer
[wangliukun@hadoop02 hadoop]$ sbin/start-dfs.sh
[wangliukun@hadoop01 ~]$ mapred --daemon start historyserver
[wangliukun@hadoop01 ~]$ jps
5395 NodeManager
5574 Jps
1944 DataNode
5545 JobHistoryServer
1836 NameNode
5>删除HDFS上已经存在的输出文件
[wangliukun@hadoop01 hadoop]$ hadoop fs -rm -R /input/wcoutput
Deleted /input/wcoutput
6>执行wordcount
[wangliukun@hadoop01 hadoop]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.4.jar wordcount /input/wcinput/ /input/wcoutput/
2024-03-25 19:18:05,472 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at hadoop02/192.168.10.141:8032
2024-03-25 19:18:06,632 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/wangliukun/.staging/job_1711364965360_0001
2024-03-25 19:18:07,249 INFO input.FileInputFormat: Total input files to process : 1
2024-03-25 19:18:07,648 INFO mapreduce.JobSubmitter: number of splits:1
2024-03-25 19:18:08,069 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1711364965360_0001
2024-03-25 19:18:08,069 INFO mapreduce.JobSubmitter: Executing with tokens: []
2024-03-25 19:18:08,465 INFO conf.Configuration: resource-types.xml not found
2024-03-25 19:18:08,465 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2024-03-25 19:18:09,311 INFO impl.YarnClientImpl: Submitted application application_1711364965360_0001
2024-03-25 19:18:09,412 INFO mapreduce.Job: The url to track the job: http://hadoop02:8088/proxy/application_1711364965360_0001/
2024-03-25 19:18:09,413 INFO mapreduce.Job: Running job: job_1711364965360_0001
2024-03-25 19:18:29,859 INFO mapreduce.Job: Job job_1711364965360_0001 running in uber mode : false
2024-03-25 19:18:29,861 INFO mapreduce.Job: map 0% reduce 0%
2024-03-25 19:18:42,422 INFO mapreduce.Job: map 100% reduce 0%
2024-03-25 19:18:49,564 INFO mapreduce.Job: map 100% reduce 100%
2024-03-25 19:18:50,602 INFO mapreduce.Job: Job job_1711364965360_0001 completed successfully
2024-03-25 19:18:50,787 INFO mapreduce.Job: Counters: 54
File System Counters
FILE: Number of bytes read=53
FILE: Number of bytes written=551433
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1558138
HDFS: Number of bytes written=47
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
HDFS: Number of bytes read erasure-coded=0
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=8361
Total time spent by all reduces in occupied slots (ms)=5166
Total time spent by all map tasks (ms)=8361
Total time spent by all reduce tasks (ms)=5166
Total vcore-milliseconds taken by all map tasks=8361
Total vcore-milliseconds taken by all reduce tasks=5166
Total megabyte-milliseconds taken by all map tasks=8561664
Total megabyte-milliseconds taken by all reduce tasks=5289984
Map-Reduce Framework
Map input records=1
Map output records=271760
Map output bytes=2626710
Map output materialized bytes=53
Input split bytes=108
Combine input records=271760
Combine output records=4
Reduce input groups=4
Reduce shuffle bytes=53
Reduce input records=4
Reduce output records=4
Spilled Records=8
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=224
CPU time spent (ms)=4030
Physical memory (bytes) snapshot=536236032
Virtual memory (bytes) snapshot=5138722816
Total committed heap usage (bytes)=403177472
Peak Map Physical memory (bytes)=337158144
Peak Map Virtual memory (bytes)=2567864320
Peak Reduce Physical memory (bytes)=199077888
Peak Reduce Virtual memory (bytes)=2570858496
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1558030
File Output Format Counters
Bytes Written=47
[wangliukun@hadoop01 hadoop]$
7>查看日志
-
浏览器输入:JobHistory
-
历史任务列表
- 查看任务运行日志
- 运行详情
5、集群启动/停止方式总结
1>各个模块分开启动停止(配置ssh是前提)常用
- 整体启动/停止HDFS
start-dfs.sh/stop-dfs.sh
- 整体启动/停止YARN
start-yarn.sh/stop-yarn.sh
2>各个服务组件逐一启动停止
- 分别启动/停止HDFS组件
hdfs --daemon start/stop namenode/datanode/secondarynamenode
- 启动/停止YARN
yarn --daemon start/stop resourcemanager/nodemanager
6、编写常用脚本
1、集群启停脚本(包含hdfs , yarn, historyserver):myhadoop.sh:
- 内容如下:
[wangliukun@hadoop01 bin]$ vim myhadoop.sh
[wangliukun@hadoop01 bin]$ pwd
/home/wangliukun/bin
[wangliukun@hadoop01 bin]$ cat myhadoop.sh
#!/bin/bash
if [ $# -lt 1 ]
then
echo "No Args Input..."
exit ;
fi
case $1 in
"start")
echo " =================== 启动 hadoop集群 ==================="
echo " --------------- 启动 hdfs ---------------"
ssh hadoop01 "/export/servers/hadoop/sbin/start-dfs.sh"
echo " --------------- 启动 yarn ---------------"
ssh hadoop02 "/export/servers/hadoop/sbin/start-yarn.sh"
echo " --------------- 启动 historyserver ---------------"
ssh hadoop01 "/export/servers/hadoop/mapred --daemon start historyserver"
;;
"stop")
echo " =================== 关闭 hadoop集群 ==================="
echo " --------------- 关闭 historyserver ---------------"
ssh hadoop01 "/export/servers/hadoop/mapred --daemon stop historyserver"
echo " --------------- 关闭 yarn ---------------"
ssh hadoop02 "/export/servers/hadoop/sbin/stop-yarn.sh"
echo " --------------- 关闭 hdfs ---------------"
ssh hadoop01 "/export/servers/hadoop/sbin/stop-dfs.sh"
;;
*)
echo "Input Args Error..."
;;
esac
[wangliukun@hadoop01 bin]$
- 执行脚本
[wangliukun@hadoop01 bin]$ chmod +x myhadoop.sh
- 将脚本复制到/bin中,以便全局调用
[wangliukun@hadoop01 bin]$ sudo cp myhadoop.sh /bin/
- 测试
[wangliukun@hadoop01 bin]$ myhadoop.sh stop
=================== 关闭 hadoop集群 ===================
--------------- 关闭 historyserver ---------------
--------------- 关闭 yarn ---------------
Stopping nodemanagers
Stopping resourcemanager
--------------- 关闭 hdfs ---------------
Stopping namenodes on [hadoop01]
Stopping datanodes
Stopping secondary namenodes [hadoop03]
[wangliukun@hadoop01 bin]$ jps
4962 Jps
[wangliukun@hadoop01 bin]$ myhadoop.sh start
=================== 启动 hadoop集群 ===================
--------------- 启动 hdfs ---------------
Starting namenodes on [hadoop01]
Starting datanodes
Starting secondary namenodes [hadoop03]
--------------- 启动 yarn ---------------
Starting resourcemanager
Starting nodemanagers
--------------- 启动 historyserver ---------------
[wangliukun@hadoop01 bin]$
2、查看三台服务器java进程脚本
[wangliukun@hadoop01 bin]$ pwd
/home/wangliukun/bin
[wangliukun@hadoop01 bin]$ ll
#在当前目录下创建一个jpsall文件
总用量 12
-rwxrwxr-x. 1 wangliukun wangliukun 160 3月 28 16:42 jpsall
-rwxrwxrwx. 1 wangliukun wangliukun 1112 3月 28 11:32 myhadoop.sh
-rwxr-xr-x. 1 wangliukun wangliukun 736 3月 14 22:37 xsync
#文件内容如下,注(给文件执行权限chmod +x jpsall)
[wangliukun@hadoop01 bin]$ cat jpsall
#!/bin/bash
for host in hadoop01 hadoop02 hadoop03
do
echo =============== $host ===============
ssh $host "/export/servers/jdk/bin/jps"
done
#配置全局变量
[wangliukun@hadoop01 bin]$ sudo cp ./jpsall /bin/
[wangliukun@hadoop01 bin]$ jpsall
=============== hadoop01 ===============
1712 NameNode
1824 DataNode
2164 NodeManager
2429 Jps
2319 JobHistoryServer
=============== hadoop02 ===============
1632 ResourceManager
1744 NodeManager
1492 DataNode
2166 Jps
=============== hadoop03 ===============
1408 NodeManager
1601 Jps
1331 SecondaryNameNode
1263 DataNode
发布给其余两节点
[wangliukun@hadoop01 ~]$ xsync /home/wangliukun/bin/jpsall
==================== hadoop01 ====================
sending incremental file list
sent 69 bytes received 12 bytes 54.00 bytes/sec
total size is 160 speedup is 1.98
==================== hadoop02 ====================
sending incremental file list
jpsall
sent 276 bytes received 35 bytes 207.33 bytes/sec
total size is 160 speedup is 0.51
==================== hadoop03 ====================
sending incremental file list
jpsall
sent 276 bytes received 35 bytes 622.00 bytes/sec
total size is 160 speedup is 0.51
[wangliukun@hadoop01 ~]$
Starting resourcemanager
Starting nodemanagers
--------------- 启动 historyserver ---------------
[wangliukun@hadoop01 bin]$
#### 2、查看三台服务器java进程脚本
```shell
[wangliukun@hadoop01 bin]$ pwd
/home/wangliukun/bin
[wangliukun@hadoop01 bin]$ ll
#在当前目录下创建一个jpsall文件
总用量 12
-rwxrwxr-x. 1 wangliukun wangliukun 160 3月 28 16:42 jpsall
-rwxrwxrwx. 1 wangliukun wangliukun 1112 3月 28 11:32 myhadoop.sh
-rwxr-xr-x. 1 wangliukun wangliukun 736 3月 14 22:37 xsync
#文件内容如下,注(给文件执行权限chmod +x jpsall)
[wangliukun@hadoop01 bin]$ cat jpsall
#!/bin/bash
for host in hadoop01 hadoop02 hadoop03
do
echo =============== $host ===============
ssh $host "/export/servers/jdk/bin/jps"
done
#配置全局变量
[wangliukun@hadoop01 bin]$ sudo cp ./jpsall /bin/
[wangliukun@hadoop01 bin]$ jpsall
=============== hadoop01 ===============
1712 NameNode
1824 DataNode
2164 NodeManager
2429 Jps
2319 JobHistoryServer
=============== hadoop02 ===============
1632 ResourceManager
1744 NodeManager
1492 DataNode
2166 Jps
=============== hadoop03 ===============
1408 NodeManager
1601 Jps
1331 SecondaryNameNode
1263 DataNode
发布给其余两节点
[wangliukun@hadoop01 ~]$ xsync /home/wangliukun/bin/jpsall
==================== hadoop01 ====================
sending incremental file list
sent 69 bytes received 12 bytes 54.00 bytes/sec
total size is 160 speedup is 1.98
==================== hadoop02 ====================
sending incremental file list
jpsall
sent 276 bytes received 35 bytes 207.33 bytes/sec
total size is 160 speedup is 0.51
==================== hadoop03 ====================
sending incremental file list
jpsall
sent 276 bytes received 35 bytes 622.00 bytes/sec
total size is 160 speedup is 0.51
[wangliukun@hadoop01 ~]$
上一章:Hadoop安装