Hadoop使用

坤坤1086

已于 2024-04-03 13:54:11 修改

阅读量615

点赞数 27

分类专栏： Linux系统Hadoop集群搭建文章标签： hadoop 大数据分布式

于 2024-04-03 13:44:58 首次发布

本文链接：https://blog.csdn.net/wang_8218/article/details/137342622

版权

Linux系统Hadoop集群搭建专栏收录该内容

2 篇文章 0 订阅

订阅专栏

Hadoop运行模式

1、Hadoop官网：http://hadoop.apache.org/

在这里插入图片描述

2、Hadoop运行模式包括

Hadoop运行模式包括：本地模式、伪分布式模式以及完全分布式模式。

本地模式：单机运行，只是用来演示一下官方案例。生产环境不用。

伪分布式模式：也是单机运行，但是具备Hadoop集群的所有功能，一台服务器模拟一个分布式的环境。个别缺钱的公司用来测试，生产环境不用。

完全分布式模式：多台服务器组成分布式环境。生产环境使用。
在这里插入图片描述

3、本地运行模式：

在/export/data/文件下上传测试文件

[wangliukun@hadoop01 data]$ sudo rz
rz waiting to receive.**[wangliukun@hadoop01 data]$ ll
总用量 1524
-rw-rw-rw-. 1 wangliukun wangliukun 1558030 3月  17 2024 test.txt

测试文件内容随便写，本次测试用于统计文件内各个单词出现的次数

执行测试

回到/export/servers/hadoop-3.3.4目录

[wangliukun@hadoop01 data]$ cd ..
[wangliukun@hadoop01 export]$ cd servers/hadoop-3.3.4/
[wangliukun@hadoop01 hadoop-3.3.4]$ ll
总用量 92
drwxr-xr-x. 2 1024 1024   203 7月  29 2022 bin
drwxr-xr-x. 3 1024 1024    20 7月  29 2022 etc
drwxr-xr-x. 2 1024 1024   106 7月  29 2022 include
drwxr-xr-x. 3 1024 1024    20 7月  29 2022 lib
drwxr-xr-x. 4 1024 1024   288 7月  29 2022 libexec
-rw-rw-r--. 1 1024 1024 24707 7月  29 2022 LICENSE-binary
drwxr-xr-x. 2 1024 1024  4096 7月  29 2022 licenses-binary
-rw-rw-r--. 1 1024 1024 15217 7月  17 2022 LICENSE.txt
-rw-rw-r--. 1 1024 1024 29473 7月  17 2022 NOTICE-binary
-rw-rw-r--. 1 1024 1024  1541 4月  22 2022 NOTICE.txt
-rw-rw-r--. 1 1024 1024   175 4月  22 2022 README.txt
drwxr-xr-x. 3 1024 1024  4096 7月  29 2022 sbin
drwxr-xr-x. 4 1024 1024    31 7月  29 2022 share
[wangliukun@hadoop01 hadoop-3.3.4]$ pwd
/export/servers/hadoop-3.3.4

执行程序

[wangliukun@hadoop01 hadoop-3.3.4]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.4.jar wordcount /export/data/ /export/data/wcoutput
2024-03-14 22:17:55,033 INFO impl.MetricsConfig: Loaded properties from hadoop-metrics2.properties
2024-03-14 22:17:55,146 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2024-03-14 22:17:55,146 INFO impl.MetricsSystemImpl: JobTracker metrics system started
2024-03-14 22:17:55,739 INFO input.FileInputFormat: Total input files to process : 1
2024-03-14 22:17:55,776 INFO mapreduce.JobSubmitter: number of splits:1
2024-03-14 22:17:56,005 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local522766157_0001
2024-03-14 22:17:56,005 INFO mapreduce.JobSubmitter: Executing with tokens: []
2024-03-14 22:17:56,224 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
2024-03-14 22:17:56,225 INFO mapreduce.Job: Running job: job_local522766157_0001
2024-03-14 22:17:56,239 INFO mapred.LocalJobRunner: OutputCommitter set in config null
2024-03-14 22:17:56,253 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2024-03-14 22:17:56,253 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2024-03-14 22:17:56,255 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
2024-03-14 22:17:56,317 INFO mapred.LocalJobRunner: Waiting for map tasks
2024-03-14 22:17:56,318 INFO mapred.LocalJobRunner: Starting task: attempt_local522766157_0001_m_000000_0
2024-03-14 22:17:56,363 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2024-03-14 22:17:56,363 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2024-03-14 22:17:56,397 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
2024-03-14 22:17:56,407 INFO mapred.MapTask: Processing split: file:/export/data/test.txt:0+1558030
2024-03-14 22:17:56,481 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
2024-03-14 22:17:56,481 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
2024-03-14 22:17:56,481 INFO mapred.MapTask: soft limit at 83886080
2024-03-14 22:17:56,481 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
2024-03-14 22:17:56,481 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
2024-03-14 22:17:56,494 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2024-03-14 22:17:56,837 INFO mapred.LocalJobRunner: 
2024-03-14 22:17:56,838 INFO mapred.MapTask: Starting flush of map output
2024-03-14 22:17:56,838 INFO mapred.MapTask: Spilling map output
2024-03-14 22:17:56,838 INFO mapred.MapTask: bufstart = 0; bufend = 2626710; bufvoid = 104857600
2024-03-14 22:17:56,838 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 25127360(100509440); length = 1087037/6553600
2024-03-14 22:17:57,110 INFO mapred.MapTask: Finished spill 0
2024-03-14 22:17:57,128 INFO mapred.Task: Task:attempt_local522766157_0001_m_000000_0 is done. And is in the process of committing
2024-03-14 22:17:57,131 INFO mapred.LocalJobRunner: map
2024-03-14 22:17:57,131 INFO mapred.Task: Task 'attempt_local522766157_0001_m_000000_0' done.
2024-03-14 22:17:57,137 INFO mapred.Task: Final Counters for attempt_local522766157_0001_m_000000_0: Counters: 18
        File System Counters
                FILE: Number of bytes read=1839168
                FILE: Number of bytes written=924802
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
        Map-Reduce Framework
                Map input records=1
                Map output records=271760
                Map output bytes=2626710
                Map output materialized bytes=53
                Input split bytes=91
                Combine input records=271760
                Combine output records=4
                Spilled Records=4
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=23
                Total committed heap usage (bytes)=279969792
        File Input Format Counters 
                Bytes Read=1558030
2024-03-14 22:17:57,137 INFO mapred.LocalJobRunner: Finishing task: attempt_local522766157_0001_m_000000_0
2024-03-14 22:17:57,144 INFO mapred.LocalJobRunner: map task executor complete.
2024-03-14 22:17:57,146 INFO mapred.LocalJobRunner: Waiting for reduce tasks
2024-03-14 22:17:57,147 INFO mapred.LocalJobRunner: Starting task: attempt_local522766157_0001_r_000000_0
2024-03-14 22:17:57,161 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2024-03-14 22:17:57,162 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2024-03-14 22:17:57,162 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
2024-03-14 22:17:57,167 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@79ef199a
2024-03-14 22:17:57,168 WARN impl.MetricsSystemImpl: JobTracker metrics system already initialized!
2024-03-14 22:17:57,278 INFO mapreduce.Job: Job job_local522766157_0001 running in uber mode : false
2024-03-14 22:17:57,279 INFO mapreduce.Job:  map 100% reduce 0%
2024-03-14 22:17:57,285 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=616195648, maxSingleShuffleLimit=154048912, mergeThreshold=406689152, ioSortFactor=10, memToMemMergeOutputsThreshold=10
2024-03-14 22:17:57,293 INFO reduce.EventFetcher: attempt_local522766157_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
2024-03-14 22:17:57,332 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local522766157_0001_m_000000_0 decomp: 49 len: 53 to MEMORY
2024-03-14 22:17:57,337 INFO reduce.InMemoryMapOutput: Read 49 bytes from map-output for attempt_local522766157_0001_m_000000_0
2024-03-14 22:17:57,338 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 49, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->49
2024-03-14 22:17:57,340 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
2024-03-14 22:17:57,343 INFO mapred.LocalJobRunner: 1 / 1 copied.
2024-03-14 22:17:57,343 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
2024-03-14 22:17:57,360 INFO mapred.Merger: Merging 1 sorted segments
2024-03-14 22:17:57,360 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 42 bytes
2024-03-14 22:17:57,362 INFO reduce.MergeManagerImpl: Merged 1 segments, 49 bytes to disk to satisfy reduce memory limit
2024-03-14 22:17:57,363 INFO reduce.MergeManagerImpl: Merging 1 files, 53 bytes from disk
2024-03-14 22:17:57,363 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
2024-03-14 22:17:57,363 INFO mapred.Merger: Merging 1 sorted segments
2024-03-14 22:17:57,365 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 42 bytes
2024-03-14 22:17:57,366 INFO mapred.LocalJobRunner: 1 / 1 copied.
2024-03-14 22:17:57,369 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
2024-03-14 22:17:57,372 INFO mapred.Task: Task:attempt_local522766157_0001_r_000000_0 is done. And is in the process of committing
2024-03-14 22:17:57,373 INFO mapred.LocalJobRunner: 1 / 1 copied.
2024-03-14 22:17:57,373 INFO mapred.Task: Task attempt_local522766157_0001_r_000000_0 is allowed to commit now
2024-03-14 22:17:57,374 INFO output.FileOutputCommitter: Saved output of task 'attempt_local522766157_0001_r_000000_0' to file:/export/data/wcoutput
2024-03-14 22:17:57,387 INFO mapred.LocalJobRunner: reduce > reduce
2024-03-14 22:17:57,387 INFO mapred.Task: Task 'attempt_local522766157_0001_r_000000_0' done.
2024-03-14 22:17:57,388 INFO mapred.Task: Final Counters for attempt_local522766157_0001_r_000000_0: Counters: 24
        File System Counters
                FILE: Number of bytes read=1839306
                FILE: Number of bytes written=924914
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
        Map-Reduce Framework
                Combine input records=0
                Combine output records=0
                Reduce input groups=4
                Reduce shuffle bytes=53
                Reduce input records=4
                Reduce output records=4
                Spilled Records=4
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=84
                Total committed heap usage (bytes)=179306496
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Output Format Counters 
                Bytes Written=59
2024-03-14 22:17:57,388 INFO mapred.LocalJobRunner: Finishing task: attempt_local522766157_0001_r_000000_0
2024-03-14 22:17:57,388 INFO mapred.LocalJobRunner: reduce task executor complete.
2024-03-14 22:17:58,284 INFO mapreduce.Job:  map 100% reduce 100%
2024-03-14 22:17:58,285 INFO mapreduce.Job: Job job_local522766157_0001 completed successfully
2024-03-14 22:17:58,303 INFO mapreduce.Job: Counters: 30
        File System Counters
                FILE: Number of bytes read=3678474
                FILE: Number of bytes written=1849716
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
        Map-Reduce Framework
                Map input records=1
                Map output records=271760
                Map output bytes=2626710
                Map output materialized bytes=53
                Input split bytes=91
                Combine input records=271760
                Combine output records=4
                Reduce input groups=4
                Reduce shuffle bytes=53
                Reduce input records=4
                Reduce output records=4
                Spilled Records=8
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=107
                Total committed heap usage (bytes)=459276288
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=1558030
        File Output Format Counters 
                Bytes Written=59

[wangliukun@hadoop01 hadoop-3.3.4]$ cd /export/data/
[wangliukun@hadoop01 data]$ ll
总用量 1524
-rw-rw-rw-. 1 wangliukun wangliukun 1558030 3月  17 2024 test.txt
drwxr-xr-x. 2 wangliukun wangliukun      88 3月  14 22:17 wcoutput

查看结果

[wangliukun@hadoop01 data]$ cd wcoutput/
[wangliukun@hadoop01 wcoutput]$ ll
总用量 4
-rw-r--r--. 1 wangliukun wangliukun 47 3月  14 22:17 part-r-00000
-rw-r--r--. 1 wangliukun wangliukun  0 3月  14 22:17 _SUCCESS
[wangliukun@hadoop01 wcoutput]$ cat part-r-00000 
good    72530
hadoop  54170
hdfs    72530
hello   72530
[wangliukun@hadoop01 wcoutput]$

此处我们将/export/servers/hadoop-3.3.4更改为hadoop，方便使用

[wangliukun@hadoop01 servers]$ mv hadoop-3.3.4/ ./hadoop
[wangliukun@hadoop01 servers]$ ll
总用量 0
drwxr-xr-x. 10 wangliukun wangliukun 215 3月  14 22:15 hadoop
drwxr-xr-x.  8 wangliukun wangliukun 255 7月  22 2017 jdk
[wangliukun@hadoop01 servers]$

4、完全分布式

准备工作：

准备3台客户机（关闭防火墙、静态IP、主机名称）
安装JDK
配置环境变量
安装Hadoop
配置环境变量
配置集群
单点启动
配置ssh
群起并测试集群

由于我们是克隆的虚拟机，只需完成部分工作

1、解压安装hadoop，配置环境变量

Hadoop02：

#解压
[wangliukun@hadoop02 software]# tar -zxvf hadoop-3.3.4.tar.gz -C /export/servers/
[wangliukun@hadoop02 software]# cd ../servers/
[wangliukun@hadoop02 servers]# ll
总用量 0
drwxr-xr-x  10 wwangliukun wangliukun 1024 215 7月  29 2022 hadoop-3.3.4
drwxr-xr-x.  8 wangliukun wangliukun 255 7月  22 2017 jdk
[wangliukun@hadoop02 servers]# mv hadoop-3.3.4/ ./hadoop
[wangliukun@hadoop02 servers]# ll
总用量 0
drwxr-xr-x  10 wangliukun wangliukun  1024 215 7月  29 2022 hadoop
drwxr-xr-x.  8 wangliukun wangliukun 255 7月  22 2017 jdk
#配置变量
[wangliukun@hadoop02 servers]$ cd hadoop/
[wangliukun@hadoop02 hadoop]$ pwd
/export/servers/hadoop
[wangliukun@hadoop02 hadoop]$ sudo vim /etc/profile
[sudo] wangliukun 的密码：
#使文件生效
[wangliukun@hadoop02 hadoop]$ source /etc/profile
#hadoop测试
[wangliukun@hadoop02 hadoop]$ hadoop version
Hadoop 3.3.4
Source code repository https://github.com/apache/hadoop.git -r a585a73c3e02ac62350c136643a5e7f6095a3dbb
Compiled by stevel on 2022-07-29T12:32Z
Compiled with protoc 3.7.1
From source with checksum fb9dd8918a7b8a5b430d61af858f6ec
This command was run using /export/servers/hadoop/share/hadoop/common/hadoop-common-3.3.4.jar
[wangliukun@hadoop02 hadoop]$

在以上配置中，/etc/profilr中添加如下：

#HADOOP_HOME
export HADOOP_HOME=/export/servers/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin

Hadoop03：

#解压
[wangliukun@hadoop03 software]# tar -zxvf hadoop-3.3.4.tar.gz -C /export/servers/
[wangliukun@hadoop03 software]# cd ../servers/
[wangliukun@hadoop03 servers]# ll
总用量 0
drwxr-xr-x  10 wangliukun wangliukun 1024 215 7月  29 2022 hadoop-3.3.4
drwxr-xr-x.  8 wangliukun wangliukun 255 7月  22 2017 jdk
[wangliukun@hadoop03 servers]# mv hadoop-3.3.4/ ./hadoop
[wangliukun@hadoop03 servers]# ll
总用量 0
drwxr-xr-x  10 wangliukun wangliukun 1024 215 7月  29 2022 hadoop
drwxr-xr-x.  8 wangliukun wangliukun 255 7月  22 2017 jdk
#配置变量
[wangliukun@hadoop03 hadoop]$ pwd
/export/servers/hadoop
[wangliukun@hadoop03 hadoop]$ sudo vim /etc/profile
[sudo] wangliukun 的密码：
#使文件生效
[wangliukun@hadoop03 hadoop]$ source /etc/profile
#hadoop测试
[wangliukun@hadoop03 hadoop]$ hadoop version
Hadoop 3.3.4
Source code repository https://github.com/apache/hadoop.git -r a585a73c3e02ac62350c136643a5e7f6095a3dbb
Compiled by stevel on 2022-07-29T12:32Z
Compiled with protoc 3.7.1
From source with checksum fb9dd8918a7b8a5b430d61af858f6ec
This command was run using /export/servers/hadoop/share/hadoop/common/hadoop-common-3.3.4.jar
[wangliukun@hadoop03 hadoop]$ ^C

在以上配置中，/etc/profilr中添加如下：

#HADOOP_HOME
export HADOOP_HOME=/export/servers/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin

2、xsync集群分发脚本

需求：循环复制文件到所有节点的相同目录下
需求分析：

rsync命令原始拷贝：

rsync  -av     /opt/module  	atguigu@hadoop103:/opt/

期望脚本：xsync要同步的文件名称

期望脚本在任何路径都能使用（脚本放在声明了全局环境变量的路径）

[wangliukun@hadoop01 /]$ echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/export/servers/jdk/bin:/root/bin:/export/servers/jdk/bin:/export/servers/hadoop-3.3.4/bin:/export/servers/hadoop-3.3.4/sbin
[wangliukun@hadoop01 /]$

脚本实现

在/home/wangliukun/bin目录下创建xsync文件

[wangliukun@hadoop01 bin]$ cd /home/wangliukun/
[wangliukun@hadoop01 ~]$ mkdir bin
[wangliukun@hadoop01 ~]$ cd bin
[wangliukun@hadoop01 bin]$ vim xsync

脚本内容如下：

[wangliukun@hadoop01 bin]$ cat xsync
#!/bin/bash

#1. 判断参数个数
if [ $# -lt 1 ]
then
    echo Not Enough Arguement!
    exit;
fi
#2. 遍历集群所有机器
for host in hadoop01 hadoop02 hadoop03
do
    echo ====================  $host  ====================
    #3. 遍历所有目录，挨个发送

    for file in $@
    do
        #4. 判断文件是否存在
        if [ -e $file ]
            then
                #5. 获取父目录
                pdir=$(cd -P $(dirname $file); pwd)

                #6. 获取当前文件的名称
                fname=$(basename $file)
                ssh $host "mkdir -p $pdir"
                rsync -av $pdir/$fname $host:$pdir
            else
                echo $file does not exists!
        fi
    done
done
[wangliukun@hadoop01 bin]$

修改脚本 xsync 具有执行权限

[wangliukun@hadoop01 bin]$ chmod +x xsync

测试脚本

[wangliukun@hadoop01 ~]$ xsync /home/wangliukun/bin
==================== hadoop01 ====================
wangliukun@hadoop01's password: 
wangliukun@hadoop01's password: 
sending incremental file list

sent 179 bytes  received 19 bytes  79.20 bytes/sec
total size is 736  speedup is 3.72
==================== hadoop02 ====================
wangliukun@hadoop02's password: 
wangliukun@hadoop02's password: 
sending incremental file list
bin/
bin/w/

sent 189 bytes  received 25 bytes  85.60 bytes/sec
total size is 736  speedup is 3.44
==================== hadoop03 ====================
wangliukun@hadoop03's password: 
wangliukun@hadoop03's password: 
sending incremental file list
bin/
bin/w/

sent 185 bytes  received 25 bytes  140.00 bytes/sec
total size is 736  speedup is 3.50
[wangliukun@hadoop01 ~]$

将脚本复制到/bin中，以便全局调用

[wangliukun@hadoop01 bin]$ sudo cp xsync /bin/
[sudo] wangliukun 的密码：

同步环境变量配置（root所有者）

[wangliukun@hadoop01 ~]$ sudo /bin/xsync /etc/profile.d/
==================== hadoop01 ====================
root@hadoop01's password: 
root@hadoop01's password: 
sending incremental file list

sent 288 bytes  received 17 bytes  122.00 bytes/sec
total size is 10,877  speedup is 35.66
==================== hadoop02 ====================
root@hadoop02's password: 
root@hadoop02's password: 
sending incremental file list

sent 284 bytes  received 17 bytes  200.67 bytes/sec
total size is 10,877  speedup is 36.14
==================== hadoop03 ====================
root@hadoop03's password: 
root@hadoop03's password: 
sending incremental file list

sent 284 bytes  received 17 bytes  120.40 bytes/sec
total size is 10,877  speedup is 36.14

让环境变量生效

[wangliukun@hadoop01 ~]$ source /etc/profile
[wangliukun@hadoop02 bin]$ source /etc/profile
[wangliukun@hadoop03 bin]$ source /etc/profile

3、ssh免密登录

配置ssh

[wangliukun@hadoop01 ~]$ ssh hadoop02
wangliukun@hadoop02's password: 
Last login: Thu Mar 14 22:47:33 2024
[wangliukun@hadoop02 ~]$ exit
登出
Connection to hadoop02 closed.
[wangliukun@hadoop01 ~]$

无密钥配置（在三台虚拟机上分别进行）

#生成公钥和私钥
[wangliukun@hadoop01 ~]$ cd .ssh
[wangliukun@hadoop01 .ssh]$ ll
总用量 4
-rw-r--r--. 1 wangliukun wangliukun 555 3月  14 23:08 known_hosts
[wangliukun@hadoop01 .ssh]$ pwd
/home/wangliukun/.ssh
[wangliukun@hadoop01 .ssh]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/wangliukun/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/wangliukun/.ssh/id_rsa.
Your public key has been saved in /home/wangliukun/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:7x6u66p0lO0Aegq8Lx4nJwdwvwXGTuqdetE4LPaH7DI wangliukun@hadoop01
The key's randomart image is:
+---[RSA 2048]----+
|                 |
|   .             |
|. . *            |
|o. B o o         |
|.oo.+o= S        |
| o=+==.o .       |
| *oB=+. . o      |
| .Eo=..  o .     |
|..o*oo.o+++      |
+----[SHA256]-----+
[wangliukun@hadoop01 .ssh]$ ll
总用量 12
-rw-------. 1 wangliukun wangliukun 1679 3月  14 23:32 id_rsa#私钥
-rw-r--r--. 1 wangliukun wangliukun  401 3月  14 23:32 id_rsa.pub#公钥
-rw-r--r--. 1 wangliukun wangliukun  555 3月  14 23:08 known_hosts
[wangliukun@hadoop01 .ssh]$ 

#将公钥拷贝到要免密登录的目标机器上
[wangliukun@hadoop01 .ssh]$ ssh-copy-id hadoop01
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/wangliukun/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
wangliukun@hadoop01's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'hadoop01'"
and check to make sure that only the key(s) you wanted were added.

[wangliukun@hadoop01 .ssh]$ ssh-copy-id hadoop02
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/wangliukun/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
wangliukun@hadoop02's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'hadoop02'"
and check to make sure that only the key(s) you wanted were added.

[wangliukun@hadoop01 .ssh]$ ssh-copy-id hadoop03
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/wangliukun/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
wangliukun@hadoop03's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'hadoop03'"
and check to make sure that only the key(s) you wanted were added.

#测试
[wangliukun@hadoop01 .ssh]$ cd /
[wangliukun@hadoop01 /]$ ssh hadoop02
Last login: Fri Mar 15 00:58:05 2024 from hadoop01
[wangliukun@hadoop02 ~]$ exit
登出
Connection to hadoop02 closed.
[wangliukun@hadoop01 /]$

known_hosts	记录ssh访问过计算机的公钥（public key）
id_rsa	生成的私钥
id_rsa.pub	生成的公钥
authorized_keys	存放授权过的无密登录服务器公钥

以上共传递9个公钥

4、集群配置

1>集群部署规划：

NameNode和SecondaryNameNode不要安装在同一台服务器

ResourceManager也很消耗内存，不要和NameNode、SecondaryNameNode配置在同一台机器上。

	hadoop01	hadoop02	hadoop03
HDFS	NameNode、DataNode	DataNode	SecondaryNameNode、DataNode
YARN	NodeManager	ResourceManager、NodeManager	NodeManager

2>配置文件说明

Hadoop配置文件分两类：默认配置文件和自定义配置文件，只有用户想修改某一默认配置值时，才需要修改自定义配置文件，更改相应属性值。

<1>默认配置文件

要获取的默认文件	文件存放在Hadoop的jar包中的位置
[core-default.xml]	hadoop-common-3.1.3.jar/core-default.xml
[hdfs-default.xml]	hadoop-hdfs-3.1.3.jar/hdfs-default.xml
[yarn-default.xml]	hadoop-yarn-common-3.1.3.jar/yarn-default.xml
[mapred-default.xml]	hadoop-mapreduce-client-core-3.1.3.jar/mapred-default.xml

<2>自定义配置文件

core-site.xml、hdfs-site.xml、yarn-site.xml、mapred-site.xml四个配置文件存放在$HADOOP_HOME/etc/hadoop这个路径上，用户可以根据项目需求重新进行修改配置。

3>配置集群

核心配置文件

配置core-site.xml

[wangliukun@hadoop01 hadoop]$ vim core-site.xml

文件内容：

[wangliukun@hadoop01 hadoop]$ cat core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
 <!-- 指定NameNode的地址 -->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://hadoop01:8020</value>
    </property>

    <!-- 指定hadoop数据的存储目录 -->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/export/servers/hadoop/data</value>
    </property>
</configuration>

HDFS配置文件

配置hdfs-site.xml

[wangliukun@hadoop01 hadoop]$ vim hdfs-site.xml

文件内容：

[wangliukun@hadoop01 hadoop]$ cat hdfs-site.xml 
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<!-- nn web端访问地址-->
        <property>
        <name>dfs.namenode.http-address</name>
        <value>hadoop01:9870</value>
    </property>
        <!-- 2nn web端访问地址-->
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>hadoop03:9868</value>
    </property>
</configuration>

YARN配置文件

配置yarn-site.xml

[wangliukun@hadoop01 hadoop]$ vim yarn-site.xml

文件内容：

[wangliukun@hadoop01 hadoop]$ cat yarn-site.xml 
<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>
  <!-- 指定MR走shuffle -->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>

    <!-- 指定ResourceManager的地址-->
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>hadoop02</value>
    </property>

    <!-- 环境变量的继承 -->
    <property>
        <name>yarn.nodemanager.env-whitelist</name>
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
    </property>
</configuration>

MapReduce配置文件

[wangliukun@hadoop01 hadoop]$ vim mapred-site.xml

文件内容：

[wangliukun@hadoop01 hadoop]$ cat mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<!-- 指定MapReduce程序运行在Yarn上 -->
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

4>在集群上分发配置好的Hadoop配置文件

[wangliukun@hadoop01 hadoop]$ xsync hadoop/

5>查询Hadoop02/Hadoop03上分发文件配置情况

[root@hadoop02 hadoop]# cat core-site.xml 


[root@hadoop03 hadoop]# cat core-site.xml

5、群起集群

1、配置workers

[wangliukun@hadoop01 hadoop]$ vim workers 
[wangliukun@hadoop01 hadoop]$ cat workers 
hadoop01
hadoop02
hadoop03

同步所有节点

[wangliukun@hadoop01 hadoop]$ xsync workers 
==================== hadoop01 ====================
sending incremental file list

sent 69 bytes  received 12 bytes  162.00 bytes/sec
total size is 27  speedup is 0.33
==================== hadoop02 ====================
sending incremental file list
workers

sent 143 bytes  received 41 bytes  368.00 bytes/sec
total size is 27  speedup is 0.15
==================== hadoop03 ====================
sending incremental file list
workers

sent 143 bytes  received 41 bytes  368.00 bytes/sec
total size is 27  speedup is 0.15

2、启动集群

如果集群是第一次启动，需要在hadoop01节点格式化NameNode
注意：格式化NameNode，会产生新的集群id，导致NameNode和DataNode的集群id不一致，集群找不到已往数据。如果集群在运行过程中报错，需要重新格式化NameNode的话，一定要先停止namenode和datanode进程，并且要删除所有机器的data和logs目录，然后再进行格式化。

[wangliukun@hadoop01 hadoop]$ hdfs namenode -format

当前服务器版本号：

[wangliukun@hadoop01 current]$ ll
总用量 16
-rw-rw-r--. 1 wangliukun wangliukun 405 3月  18 17:28 fsimage_0000000000000000000
-rw-rw-r--. 1 wangliukun wangliukun  62 3月  18 17:28 fsimage_0000000000000000000.md5
-rw-rw-r--. 1 wangliukun wangliukun   2 3月  18 17:28 seen_txid
-rw-rw-r--. 1 wangliukun wangliukun 218 3月  18 17:28 VERSION

启动HDFS

[wangliukun@hadoop01 hadoop]$ sbin/start-dfs.sh 
Starting namenodes on [hadoop01]
hadoop01: ERROR: JAVA_HOME is not set and could not be found.
Starting datanodes
hadoop03: ERROR: JAVA_HOME is not set and could not be found.
hadoop01: ERROR: JAVA_HOME is not set and could not be found.
hadoop02: ERROR: JAVA_HOME is not set and could not be found.
Starting secondary namenodes [hadoop03]
hadoop03: ERROR: JAVA_HOME is not set and could not be found.

报错：

ERROR: JAVA_HOME is not set and could not be found.

原因：

在JDK配置无错的情况下，可能是没有配置hadoop-env.sh文件。这个文件里写的是hadoop的环境变量,主要修改hadoop的JAVA_HOME路径。

解决办法：

切到 [hadoop]/etc/hadoop目录
执行：vim hadoop-env.sh
修改java_home路径和hadoop_conf_dir路径为具体的安装路径

export JAVA_HOME=/export/servers/jdk
export HADOOP_CONF_DIR=/export/servers/hadoop/etc/hadoop

执行开启HDFS：

[wangliukun@hadoop01 hadoop]$ sbin/start-dfs.sh 
Starting namenodes on [hadoop01]
hadoop01: namenode is running as process 2808.  Stop it first and ensure /tmp/hadoop-wangliukun-namenode.pid file is empty before retry.
Starting datanodes
hadoop02: WARNING: /export/servers/hadoop/logs does not exist. Creating.
hadoop03: WARNING: /export/servers/hadoop/logs does not exist. Creating.
hadoop01: datanode is running as process 2918.  Stop it first and ensure /tmp/hadoop-wangliukun-datanode.pid file is empty before retry.
Starting secondary namenodes [hadoop03]

继续报错

hadoop02: WARNING: /export/servers/hadoop/logs does not exist. Creating.
hadoop03: WARNING: /export/servers/hadoop/logs does not exist. Creating.

原因：

只有hadoop01完成了格式化，Hadoop02/Hadoop03未进行格式化

解决办法：

格式Hadoop02/Hadoop03

[root@hadoop02 hadoop]# hdfs namenode -format

[root@hadoop03 hadoop]# hdfs namenode -format

再次开启HDFS

[wangliukun@hadoop01 hadoop]$ sbin/start-dfs.sh 
Starting namenodes on [hadoop01]
hadoop01: namenode is running as process 2808.  Stop it first and ensure /tmp/hadoop-wangliukun-namenode.pid file is empty before retry.
Starting datanodes
hadoop02: datanode is running as process 1906.  Stop it first and ensure /tmp/hadoop-wangliukun-datanode.pid file is empty before retry.
hadoop03: datanode is running as process 1550.  Stop it first and ensure /tmp/hadoop-wangliukun-datanode.pid file is empty before retry.
hadoop01: datanode is running as process 2918.  Stop it first and ensure /tmp/hadoop-wangliukun-datanode.pid file is empty before retry.
Starting secondary namenodes [hadoop03]
hadoop03: secondarynamenode is running as process 1604.  Stop it first and ensure /tmp/hadoop-wangliukun-secondarynamenode.pid file is empty before retry.
[wangliukun@hadoop01 hadoop]$

开启成功

查看：

[wangliukun@hadoop01 hadoop]$ jps
2918 DataNode
2808 NameNode
4024 Jps

[root@hadoop02 hadoop]# jps
1906 DataNode
2227 Jps

[root@hadoop03 hadoop]# jps
2097 Jps
1604 SecondaryNameNode
1550 DataNode

Web端查看HDFS的NameNode

浏览器中输入：http://hadoop102:9870

出现问题：

无法访问

原因：

在本物理机上ping hadoop01 若ping不通

本机无法识别hadoop01

解决办法：

修改本机hosts，添加IP+主机名
在这里插入图片描述

填加如下信息：

192.168.10.140 hadoop01
192.168.10.141 hadoop02
192.168.10.142 hadoop03

主机pinghadoop01：已通

浏览器中输入：Browsing HDFS

查看HDFS上存储的数据信息

在这里插入图片描述

在配置了ResourceManager的节点（hadoop02）启动YARN

[root@hadoop02 hadoop]# sbin/start-yarn.sh
Starting resourcemanager
ERROR: Attempting to operate on yarn resourcemanager as root
ERROR: but there is no YARN_RESOURCEMANAGER_USER defined. Aborting operation.
Starting nodemanagers
ERROR: Attempting to operate on yarn nodemanager as root
ERROR: but there is no YARN_NODEMANAGER_USER defined. Aborting operation.

报错：

ERROR: Attempting to operate on yarn nodemanager as root

原因：root无法启动，请使用创建yarn的用户

解决办法：进入普通用户

[root@hadoop02 hadoop]# su wangliukun
[wangliukun@hadoop02 hadoop]$ sbin/start-yarn.sh
Starting resourcemanager
Starting nodemanagers

启动成功

[wangliukun@hadoop01 hadoop]$ jps
4116 NodeManager
2918 DataNode
4214 Jps
2808 NameNode


[wangliukun@hadoop02 hadoop]$ jps
1906 DataNode
2740 Jps
2598 NodeManager
2490 ResourceManager


[wangliukun@hadoop02 hadoop]$ jps
1604 SecondaryNameNode
2344 Jps
2235 NodeManager
1550 DataNode
[wangliukun@hadoop02 hadoop]$

Web端查看YARN的ResourceManager

浏览器中输入：All Applications

查看YARN上运行的Job信息

在这里插入图片描述

3、集群基本测试

上传文件到集群

小文件

[wangliukun@hadoop01 hadoop]$ hadoop fs -mkdir /input
[wangliukun@hadoop01 hadoop]$ 

#传输文件
[wangliukun@hadoop01 hadoop]$ hadoop fs -put /export/data/wcinput/test.txt /input

在这里插入图片描述

大文件

[wangliukun@hadoop01 hadoop]$ hadoop fs -put /export/software/jdk-8u144-linux-x64.tar.gz /
[wangliukun@hadoop01 hadoop]$

在这里插入图片描述

查看文件位置

查看HDFS文件存储路径

[wangliukun@hadoop01 hadoop]$ ll
总用量 96
drwxr-xr-x. 2 wangliukun wangliukun   203 7月  29 2022 bin
drwxrwxr-x. 4 wangliukun wangliukun    37 3月  18 18:50 data
drwxr-xr-x. 3 wangliukun wangliukun    20 7月  29 2022 etc
drwxr-xr-x. 2 wangliukun wangliukun   106 7月  29 2022 include
drwxr-xr-x. 3 wangliukun wangliukun    20 7月  29 2022 lib
drwxr-xr-x. 4 wangliukun wangliukun   288 7月  29 2022 libexec
-rw-rw-r--. 1 wangliukun wangliukun 24707 7月  29 2022 LICENSE-binary
drwxr-xr-x. 2 wangliukun wangliukun  4096 7月  29 2022 licenses-binary
-rw-rw-r--. 1 wangliukun wangliukun 15217 7月  17 2022 LICENSE.txt
drwxrwxr-x. 3 wangliukun wangliukun  4096 3月  25 16:35 logs
-rw-rw-r--. 1 wangliukun wangliukun 29473 7月  17 2022 NOTICE-binary
-rw-rw-r--. 1 wangliukun wangliukun  1541 4月  22 2022 NOTICE.txt
-rw-rw-r--. 1 wangliukun wangliukun   175 4月  22 2022 README.txt
drwxr-xr-x. 3 wangliukun wangliukun  4096 7月  29 2022 sbin
drwxr-xr-x. 4 wangliukun wangliukun    31 7月  29 2022 share
[wangliukun@hadoop01 hadoop]$ cd data/
[wangliukun@hadoop01 data]$ ll
总用量 0
drwxrwxr-x. 4 wangliukun wangliukun 30 3月  18 17:43 dfs
drwxr-xr-x. 5 wangliukun wangliukun 57 3月  25 16:35 nm-local-dir
[wangliukun@hadoop01 data]$ cd dfs/
[wangliukun@hadoop01 dfs]$ ll
总用量 0
drwx------. 3 wangliukun wangliukun 40 3月  25 16:35 data
drwxrwxr-x. 3 wangliukun wangliukun 40 3月  25 16:35 name
[wangliukun@hadoop01 dfs]$ cd data/
[wangliukun@hadoop01 data]$ ll
总用量 4
drwxrwxr-x. 3 wangliukun wangliukun 70 3月  18 17:44 current
-rw-rw-r--. 1 wangliukun wangliukun 13 3月  25 16:35 in_use.lock
[wangliukun@hadoop01 data]$ cd current/
[wangliukun@hadoop01 current]$ ll
总用量 4
drwx------. 4 wangliukun wangliukun  54 3月  25 16:35 BP-380961614-192.168.10.140-1710754117775
-rw-rw-r--. 1 wangliukun wangliukun 229 3月  25 16:35 VERSION
[wangliukun@hadoop01 current]$ cd BP-380961614-192.168.10.140-1710754117775/
[wangliukun@hadoop01 BP-380961614-192.168.10.140-1710754117775]$ ll
总用量 4
drwxrwxr-x. 4 wangliukun wangliukun  64 3月  21 19:32 current
-rw-rw-r--. 1 wangliukun wangliukun 166 3月  18 17:44 scanner.cursor
drwxrwxr-x. 2 wangliukun wangliukun   6 3月  25 16:35 tmp
[wangliukun@hadoop01 BP-380961614-192.168.10.140-1710754117775]$ cd current/
[wangliukun@hadoop01 current]$ ll
总用量 8
-rw-rw-r--. 1 wangliukun wangliukun  18 3月  21 19:32 dfsUsed
drwxrwxr-x. 3 wangliukun wangliukun  21 3月  25 16:39 finalized
drwxrwxr-x. 2 wangliukun wangliukun   6 3月  25 16:45 rbw
-rw-rw-r--. 1 wangliukun wangliukun 145 3月  25 16:35 VERSION
[wangliukun@hadoop01 current]$ cd finalized/
[wangliukun@hadoop01 finalized]$ ll
总用量 0
drwxrwxr-x. 3 wangliukun wangliukun 21 3月  25 16:39 subdir0
[wangliukun@hadoop01 finalized]$ cd subdir0/
[wangliukun@hadoop01 subdir0]$ ll
总用量 0
drwxrwxr-x. 2 wangliukun wangliukun 168 3月  25 16:45 subdir0
[wangliukun@hadoop01 subdir0]$ cd subdir0/
[wangliukun@hadoop01 subdir0]$ ll
总用量 184124
-rw-rw-r--. 1 wangliukun wangliukun   1558030 3月  25 16:39 blk_1073741825
-rw-rw-r--. 1 wangliukun wangliukun     12183 3月  25 16:39 blk_1073741825_1001.meta
-rw-rw-r--. 1 wangliukun wangliukun 134217728 3月  25 16:45 blk_1073741826
-rw-rw-r--. 1 wangliukun wangliukun   1048583 3月  25 16:45 blk_1073741826_1002.meta
-rw-rw-r--. 1 wangliukun wangliukun  51298114 3月  25 16:45 blk_1073741827
-rw-rw-r--. 1 wangliukun wangliukun    400775 3月  25 16:45 blk_1073741827_1003.meta

#路径
/export/servers/hadoop/data/dfs/data/current/BP-380961614-192.168.10.140-1710754117775/current/finalized/subdir0/subdir0

查看HDFS在磁盘存储文件内容

[wangliukun@hadoop01 subdir0]$ cat blk_1073741825

拼接

[wangliukun@hadoop01 subdir0]$ cat blk_1073741826 >> tmp.tar.gz
[wangliukun@hadoop01 subdir0]$ cat blk_1073741827 >> tmp.tar.gz 
[wangliukun@hadoop01 subdir0]$ ll
总用量 577340
-rw-rw-r--. 1 wangliukun wangliukun   1558030 3月  25 16:39 blk_1073741825
-rw-rw-r--. 1 wangliukun wangliukun     12183 3月  25 16:39 blk_1073741825_1001.meta
-rw-rw-r--. 1 wangliukun wangliukun 134217728 3月  25 16:45 blk_1073741826
-rw-rw-r--. 1 wangliukun wangliukun   1048583 3月  25 16:45 blk_1073741826_1002.meta
-rw-rw-r--. 1 wangliukun wangliukun  51298114 3月  25 16:45 blk_1073741827
-rw-rw-r--. 1 wangliukun wangliukun    400775 3月  25 16:45 blk_1073741827_1003.meta
-rw-rw-r--. 1 wangliukun wangliukun 185515842 3月  25 17:01 tmp.tar.gz
[wangliukun@hadoop01 subdir0]$ tar -zxvf tmp.tar.gz

解压完发现拼接的文件为传输的jdk

[wangliukun@hadoop01 subdir0]$ ll
总用量 577340
-rw-rw-r--. 1 wangliukun wangliukun   1558030 3月  25 16:39 blk_1073741825
-rw-rw-r--. 1 wangliukun wangliukun     12183 3月  25 16:39 blk_1073741825_1001.meta
-rw-rw-r--. 1 wangliukun wangliukun 134217728 3月  25 16:45 blk_1073741826
-rw-rw-r--. 1 wangliukun wangliukun   1048583 3月  25 16:45 blk_1073741826_1002.meta
-rw-rw-r--. 1 wangliukun wangliukun  51298114 3月  25 16:45 blk_1073741827
-rw-rw-r--. 1 wangliukun wangliukun    400775 3月  25 16:45 blk_1073741827_1003.meta
drwxr-xr-x. 8 wangliukun wangliukun       255 7月  22 2017 jdk1.8.0_144
-rw-rw-r--. 1 wangliukun wangliukun 185515842 3月  25 17:01 tmp.tar.gz
[wangliukun@hadoop01 subdir0]$

由网页知，jdk共存了三份

查询可知分别在三台虚拟机节点上

[wangliukun@hadoop01 subdir0]$ ll
总用量 365292
-rw-rw-r--. 1 wangliukun wangliukun   1558030 3月  25 16:39 blk_1073741825
-rw-rw-r--. 1 wangliukun wangliukun     12183 3月  25 16:39 blk_1073741825_1001.meta
-rw-rw-r--. 1 wangliukun wangliukun 134217728 3月  25 16:45 blk_1073741826
-rw-rw-r--. 1 wangliukun wangliukun   1048583 3月  25 16:45 blk_1073741826_1002.meta
-rw-rw-r--. 1 wangliukun wangliukun  51298114 3月  25 16:45 blk_1073741827
-rw-rw-r--. 1 wangliukun wangliukun    400775 3月  25 16:45 blk_1073741827_1003.meta
drwxr-xr-x. 8 wangliukun wangliukun       255 7月  22 2017 jdk1.8.0_144
-rw-rw-r--. 1 wangliukun wangliukun 185515842 3月  25 17:01 tmp.tar.gz
[wangliukun@hadoop01 subdir0]$ 


[wangliukun@hadoop02 hadoop]$ cd data/dfs/data/current/BP-380961614-192.168.10.140-1710754117775/current/finalized/subdir0/subdir0/
[wangliukun@hadoop02 subdir0]$ ll
总用量 184124
-rw-rw-r-- 1 wangliukun wangliukun   1558030 3月  25 16:39 blk_1073741825
-rw-rw-r-- 1 wangliukun wangliukun     12183 3月  25 16:39 blk_1073741825_1001.meta
-rw-rw-r-- 1 wangliukun wangliukun 134217728 3月  25 16:45 blk_1073741826
-rw-rw-r-- 1 wangliukun wangliukun   1048583 3月  25 16:45 blk_1073741826_1002.meta
-rw-rw-r-- 1 wangliukun wangliukun  51298114 3月  25 16:45 blk_1073741827
-rw-rw-r-- 1 wangliukun wangliukun    400775 3月  25 16:45 blk_1073741827_1003.meta
[wangliukun@hadoop02 subdir0]$ 


[wangliukun@hadoop03 hadoop]$ cd data/dfs/data/current/BP-380961614-192.168.10.140-1710754117775/current/finalized/subdir0/subdir0/
[wangliukun@hadoop03 subdir0]$ ll
总用量 184124
-rw-rw-r-- 1 wangliukun wangliukun   1558030 3月  25 16:39 blk_1073741825
-rw-rw-r-- 1 wangliukun wangliukun     12183 3月  25 16:39 blk_1073741825_1001.meta
-rw-rw-r-- 1 wangliukun wangliukun 134217728 3月  25 16:45 blk_1073741826
-rw-rw-r-- 1 wangliukun wangliukun   1048583 3月  25 16:45 blk_1073741826_1002.meta
-rw-rw-r-- 1 wangliukun wangliukun  51298114 3月  25 16:45 blk_1073741827
-rw-rw-r-- 1 wangliukun wangliukun    400775 3月  25 16:45 blk_1073741827_1003.meta
[wangliukun@hadoop03 subdir0]$

下载

[wangliukun@hadoop01 software]$ ll
总用量 860328
-rw-rw-rw-. 1 wangliukun wangliukun 695457782 2月  20 19:17 hadoop-3.3.4.tar.gz
-rw-rw-rw-. 1 wangliukun wangliukun 185515842 3月   4 16:40 jdk-8u144-linux-x64.tar.gz
[wangliukun@hadoop01 software]$  hadoop fs -get /jdk-8u144-linux-x64.tar.gz
get: `jdk-8u144-linux-x64.tar.gz': File exists

执行wordcount文件

[wangliukun@hadoop01 hadoop]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.4.jar  wordcount /input/wcinput/test.txt /input/wcoutput/
2024-03-25 17:47:54,072 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at hadoop02/192.168.10.141:8032
2024-03-25 17:47:57,870 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/wangliukun/.staging/job_1711355765914_0003
2024-03-25 17:48:01,086 INFO input.FileInputFormat: Total input files to process : 1
2024-03-25 17:48:02,460 INFO mapreduce.JobSubmitter: number of splits:1
2024-03-25 17:48:05,319 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1711355765914_0003
2024-03-25 17:48:05,320 INFO mapreduce.JobSubmitter: Executing with tokens: []
2024-03-25 17:48:08,773 INFO conf.Configuration: resource-types.xml not found
2024-03-25 17:48:08,774 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2024-03-25 17:48:10,653 INFO impl.YarnClientImpl: Submitted application application_1711355765914_0003
2024-03-25 17:48:10,830 INFO mapreduce.Job: The url to track the job: http://hadoop02:8088/proxy/application_1711355765914_0003/
2024-03-25 17:48:10,831 INFO mapreduce.Job: Running job: job_1711355765914_0003
2024-03-25 17:50:04,301 INFO mapreduce.Job: Job job_1711355765914_0003 running in uber mode : false
2024-03-25 17:50:04,405 INFO mapreduce.Job:  map 0% reduce 0%
2024-03-25 17:50:39,577 INFO mapreduce.Job:  map 100% reduce 0%
2024-03-25 17:50:48,735 INFO mapreduce.Job:  map 100% reduce 100%
2024-03-25 17:50:49,826 INFO mapreduce.Job: Job job_1711355765914_0003 completed successfully
2024-03-25 17:50:50,023 INFO mapreduce.Job: Counters: 54
        File System Counters
                FILE: Number of bytes read=53
                FILE: Number of bytes written=551097
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=1558138
                HDFS: Number of bytes written=47
                HDFS: Number of read operations=8
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
                HDFS: Number of bytes read erasure-coded=0
        Job Counters 
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=29841
                Total time spent by all reduces in occupied slots (ms)=7096
                Total time spent by all map tasks (ms)=29841
                Total time spent by all reduce tasks (ms)=7096
                Total vcore-milliseconds taken by all map tasks=29841
                Total vcore-milliseconds taken by all reduce tasks=7096
                Total megabyte-milliseconds taken by all map tasks=30557184
                Total megabyte-milliseconds taken by all reduce tasks=7266304
        Map-Reduce Framework
                Map input records=1
                Map output records=271760
                Map output bytes=2626710
                Map output materialized bytes=53
                Input split bytes=108
                Combine input records=271760
                Combine output records=4
                Reduce input groups=4
                Reduce shuffle bytes=53
                Reduce input records=4
                Reduce output records=4
                Spilled Records=8
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=4824
                CPU time spent (ms)=6700
                Physical memory (bytes) snapshot=537280512
                Virtual memory (bytes) snapshot=5144932352
                Total committed heap usage (bytes)=370671616
                Peak Map Physical memory (bytes)=337448960
                Peak Map Virtual memory (bytes)=2569183232
                Peak Reduce Physical memory (bytes)=199831552
                Peak Reduce Virtual memory (bytes)=2575749120
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=1558030
        File Output Format Counters 
                Bytes Written=47
[wangliukun@hadoop01 hadoop]$

过程中会yarn生成一个历史任务
在这里插入图片描述

6、配置历史服务器

1>配置mapred-site.xml

[wangliukun@hadoop01 hadoop]$ vim mapred-site.xml

配置文件如下：

<!-- 历史服务器端地址 -->
<property>
    <name>mapreduce.jobhistory.address</name>
    <value>hadoop01:10020</value>
</property>

<!-- 历史服务器web端地址 -->
<property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>hadoop01:19888</value>
</property>

2>分发

[wangliukun@hadoop01 hadoop]$ xsync mapred-site.xml 
==================== hadoop01 ====================
sending incremental file list

sent 78 bytes  received 12 bytes  60.00 bytes/sec
total size is 1,194  speedup is 13.27
==================== hadoop02 ====================
sending incremental file list
mapred-site.xml

sent 623 bytes  received 47 bytes  446.67 bytes/sec
total size is 1,194  speedup is 1.78
==================== hadoop03 ====================
sending incremental file list
mapred-site.xml

sent 623 bytes  received 47 bytes  1,340.00 bytes/sec
total size is 1,194  speedup is 1.78
[wangliukun@hadoop01 hadoop]$

3>在hadoop01启动历史服务器

[wangliukun@hadoop01 hadoop]$ mapred --daemon start historyserver

4>查看是否启动

[wangliukun@hadoop01 hadoop]$ jps
4291 JobHistoryServer
1944 DataNode
1836 NameNode
4349 Jps
2270 NodeManager
[wangliukun@hadoop01 hadoop]$

5>查看JobHistory

浏览器输入：http://hadoop01:19888/
在这里插入图片描述

7、配置日志的聚集

日志聚集概念：应用运行完成以后，将程序运行日志信息上传到HDFS系统上。

在这里插入图片描述

日志聚集功能好处：可以方便的查看到程序运行详情，方便开发调试。

注意：开启日志聚集功能，需要重新启动NodeManager 、ResourceManager和HistoryServer。

开启日志聚集功能具体步骤如下：

1>配置yarn-site.xml

[wangliukun@hadoop01 hadoop]$ vim yarn-site.xml

配置文件添加：

<!-- 开启日志聚集功能 -->
<property>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
</property>
<!-- 设置日志聚集服务器地址 -->
<property>  
    <name>yarn.log.server.url</name>  
    <value>http://hadoop01:19888/jobhistory/logs</value>
</property>
<!-- 设置日志保留时间为7天 -->
<property>
    <name>yarn.log-aggregation.retain-seconds</name>
    <value>604800</value>
</property>

2、分发:

[wangliukun@hadoop01 hadoop]$ xsync yarn-site.xml 
==================== hadoop01 ====================
sending incremental file list

sent 76 bytes  received 12 bytes  58.67 bytes/sec
total size is 1,647  speedup is 18.72
==================== hadoop02 ====================
sending incremental file list
yarn-site.xml

sent 1,074 bytes  received 47 bytes  2,242.00 bytes/sec
total size is 1,647  speedup is 1.47
==================== hadoop03 ====================
sending incremental file list
yarn-site.xml

sent 1,074 bytes  received 47 bytes  747.33 bytes/sec
total size is 1,647  speedup is 1.47
[wangliukun@hadoop01 hadoop]$

3、关闭NodeManager 、ResourceManager和HistoryServer

[wangliukun@hadoop02 hadoop]$ sbin/stop-yarn.sh
Stopping nodemanagers
Stopping resourcemanager

4、启动NodeManager 、ResourceManage和HistoryServer

[wangliukun@hadoop02 hadoop]$ sbin/start-dfs.sh
[wangliukun@hadoop01 ~]$ mapred --daemon start historyserver
[wangliukun@hadoop01 ~]$ jps
5395 NodeManager
5574 Jps
1944 DataNode
5545 JobHistoryServer
1836 NameNode

5>删除HDFS上已经存在的输出文件

[wangliukun@hadoop01 hadoop]$ hadoop fs -rm -R /input/wcoutput
Deleted /input/wcoutput

6>执行wordcount

[wangliukun@hadoop01 hadoop]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.4.jar  wordcount /input/wcinput/ /input/wcoutput/
2024-03-25 19:18:05,472 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at hadoop02/192.168.10.141:8032
2024-03-25 19:18:06,632 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/wangliukun/.staging/job_1711364965360_0001
2024-03-25 19:18:07,249 INFO input.FileInputFormat: Total input files to process : 1
2024-03-25 19:18:07,648 INFO mapreduce.JobSubmitter: number of splits:1
2024-03-25 19:18:08,069 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1711364965360_0001
2024-03-25 19:18:08,069 INFO mapreduce.JobSubmitter: Executing with tokens: []
2024-03-25 19:18:08,465 INFO conf.Configuration: resource-types.xml not found
2024-03-25 19:18:08,465 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2024-03-25 19:18:09,311 INFO impl.YarnClientImpl: Submitted application application_1711364965360_0001
2024-03-25 19:18:09,412 INFO mapreduce.Job: The url to track the job: http://hadoop02:8088/proxy/application_1711364965360_0001/
2024-03-25 19:18:09,413 INFO mapreduce.Job: Running job: job_1711364965360_0001
2024-03-25 19:18:29,859 INFO mapreduce.Job: Job job_1711364965360_0001 running in uber mode : false
2024-03-25 19:18:29,861 INFO mapreduce.Job:  map 0% reduce 0%
2024-03-25 19:18:42,422 INFO mapreduce.Job:  map 100% reduce 0%
2024-03-25 19:18:49,564 INFO mapreduce.Job:  map 100% reduce 100%
2024-03-25 19:18:50,602 INFO mapreduce.Job: Job job_1711364965360_0001 completed successfully
2024-03-25 19:18:50,787 INFO mapreduce.Job: Counters: 54
        File System Counters
                FILE: Number of bytes read=53
                FILE: Number of bytes written=551433
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=1558138
                HDFS: Number of bytes written=47
                HDFS: Number of read operations=8
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
                HDFS: Number of bytes read erasure-coded=0
        Job Counters 
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=8361
                Total time spent by all reduces in occupied slots (ms)=5166
                Total time spent by all map tasks (ms)=8361
                Total time spent by all reduce tasks (ms)=5166
                Total vcore-milliseconds taken by all map tasks=8361
                Total vcore-milliseconds taken by all reduce tasks=5166
                Total megabyte-milliseconds taken by all map tasks=8561664
                Total megabyte-milliseconds taken by all reduce tasks=5289984
        Map-Reduce Framework
                Map input records=1
                Map output records=271760
                Map output bytes=2626710
                Map output materialized bytes=53
                Input split bytes=108
                Combine input records=271760
                Combine output records=4
                Reduce input groups=4
                Reduce shuffle bytes=53
                Reduce input records=4
                Reduce output records=4
                Spilled Records=8
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=224
                CPU time spent (ms)=4030
                Physical memory (bytes) snapshot=536236032
                Virtual memory (bytes) snapshot=5138722816
                Total committed heap usage (bytes)=403177472
                Peak Map Physical memory (bytes)=337158144
                Peak Map Virtual memory (bytes)=2567864320
                Peak Reduce Physical memory (bytes)=199077888
                Peak Reduce Virtual memory (bytes)=2570858496
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=1558030
        File Output Format Counters 
                Bytes Written=47
[wangliukun@hadoop01 hadoop]$

7>查看日志

浏览器输入：JobHistory
历史任务列表

在这里插入图片描述

查看任务运行日志

在这里插入图片描述

运行详情

在这里插入图片描述

5、集群启动/停止方式总结

1>各个模块分开启动停止（配置ssh是前提）常用

整体启动/停止HDFS

start-dfs.sh/stop-dfs.sh

整体启动/停止YARN

start-yarn.sh/stop-yarn.sh

2>各个服务组件逐一启动停止

分别启动/停止HDFS组件

hdfs --daemon start/stop namenode/datanode/secondarynamenode

启动/停止YARN

yarn --daemon start/stop  resourcemanager/nodemanager

6、编写常用脚本

1、集群启停脚本（包含hdfs ， yarn， historyserver）：myhadoop.sh:

内容如下：

[wangliukun@hadoop01 bin]$ vim myhadoop.sh
[wangliukun@hadoop01 bin]$ pwd
/home/wangliukun/bin
[wangliukun@hadoop01 bin]$ cat myhadoop.sh 
#!/bin/bash

if [ $# -lt 1 ]
then
    echo "No Args Input..."
    exit ;
fi

case $1 in
"start")
        echo " =================== 启动 hadoop集群 ==================="

        echo " --------------- 启动 hdfs ---------------"
        ssh hadoop01 "/export/servers/hadoop/sbin/start-dfs.sh"
        echo " --------------- 启动 yarn ---------------"
        ssh hadoop02 "/export/servers/hadoop/sbin/start-yarn.sh"
        echo " --------------- 启动 historyserver ---------------"
        ssh hadoop01 "/export/servers/hadoop/mapred --daemon start historyserver"
;;
"stop")
        echo " =================== 关闭 hadoop集群 ==================="

        echo " --------------- 关闭 historyserver ---------------"
        ssh hadoop01 "/export/servers/hadoop/mapred --daemon stop historyserver"
        echo " --------------- 关闭 yarn ---------------"
        ssh hadoop02 "/export/servers/hadoop/sbin/stop-yarn.sh"
        echo " --------------- 关闭 hdfs ---------------"
        ssh hadoop01 "/export/servers/hadoop/sbin/stop-dfs.sh"
;;
*)
    echo "Input Args Error..."
;;
esac
[wangliukun@hadoop01 bin]$

执行脚本

[wangliukun@hadoop01 bin]$ chmod +x myhadoop.sh

将脚本复制到/bin中，以便全局调用

[wangliukun@hadoop01 bin]$ sudo cp myhadoop.sh /bin/

测试

[wangliukun@hadoop01 bin]$ myhadoop.sh stop
 =================== 关闭 hadoop集群 ===================
 --------------- 关闭 historyserver ---------------
 --------------- 关闭 yarn ---------------
Stopping nodemanagers
Stopping resourcemanager
 --------------- 关闭 hdfs ---------------
Stopping namenodes on [hadoop01]
Stopping datanodes
Stopping secondary namenodes [hadoop03]
[wangliukun@hadoop01 bin]$ jps
4962 Jps
[wangliukun@hadoop01 bin]$ myhadoop.sh start
 =================== 启动 hadoop集群 ===================
 --------------- 启动 hdfs ---------------
Starting namenodes on [hadoop01]
Starting datanodes
Starting secondary namenodes [hadoop03]
 --------------- 启动 yarn ---------------
Starting resourcemanager
Starting nodemanagers
 --------------- 启动 historyserver ---------------
[wangliukun@hadoop01 bin]$

2、查看三台服务器java进程脚本

[wangliukun@hadoop01 bin]$ pwd
/home/wangliukun/bin
[wangliukun@hadoop01 bin]$ ll
#在当前目录下创建一个jpsall文件
总用量 12
-rwxrwxr-x. 1 wangliukun wangliukun  160 3月  28 16:42 jpsall
-rwxrwxrwx. 1 wangliukun wangliukun 1112 3月  28 11:32 myhadoop.sh
-rwxr-xr-x. 1 wangliukun wangliukun  736 3月  14 22:37 xsync
#文件内容如下，注（给文件执行权限chmod +x jpsall）
[wangliukun@hadoop01 bin]$ cat jpsall 
#!/bin/bash

for host in hadoop01 hadoop02 hadoop03
do
        echo =============== $host ===============
        ssh $host "/export/servers/jdk/bin/jps" 
done
#配置全局变量
[wangliukun@hadoop01 bin]$ sudo cp ./jpsall /bin/
[wangliukun@hadoop01 bin]$ jpsall
=============== hadoop01 ===============
1712 NameNode
1824 DataNode
2164 NodeManager
2429 Jps
2319 JobHistoryServer
=============== hadoop02 ===============
1632 ResourceManager
1744 NodeManager
1492 DataNode
2166 Jps
=============== hadoop03 ===============
1408 NodeManager
1601 Jps
1331 SecondaryNameNode
1263 DataNode

发布给其余两节点

[wangliukun@hadoop01 ~]$ xsync /home/wangliukun/bin/jpsall 
==================== hadoop01 ====================
sending incremental file list

sent 69 bytes  received 12 bytes  54.00 bytes/sec
total size is 160  speedup is 1.98
==================== hadoop02 ====================
sending incremental file list
jpsall

sent 276 bytes  received 35 bytes  207.33 bytes/sec
total size is 160  speedup is 0.51
==================== hadoop03 ====================
sending incremental file list
jpsall

sent 276 bytes  received 35 bytes  622.00 bytes/sec
total size is 160  speedup is 0.51
[wangliukun@hadoop01 ~]$

Starting resourcemanager
Starting nodemanagers
--------------- 启动 historyserver ---------------
[wangliukun@hadoop01 bin]$


#### 2、查看三台服务器java进程脚本

```shell
[wangliukun@hadoop01 bin]$ pwd
/home/wangliukun/bin
[wangliukun@hadoop01 bin]$ ll
#在当前目录下创建一个jpsall文件
总用量 12
-rwxrwxr-x. 1 wangliukun wangliukun  160 3月  28 16:42 jpsall
-rwxrwxrwx. 1 wangliukun wangliukun 1112 3月  28 11:32 myhadoop.sh
-rwxr-xr-x. 1 wangliukun wangliukun  736 3月  14 22:37 xsync
#文件内容如下，注（给文件执行权限chmod +x jpsall）
[wangliukun@hadoop01 bin]$ cat jpsall 
#!/bin/bash

for host in hadoop01 hadoop02 hadoop03
do
        echo =============== $host ===============
        ssh $host "/export/servers/jdk/bin/jps" 
done
#配置全局变量
[wangliukun@hadoop01 bin]$ sudo cp ./jpsall /bin/
[wangliukun@hadoop01 bin]$ jpsall
=============== hadoop01 ===============
1712 NameNode
1824 DataNode
2164 NodeManager
2429 Jps
2319 JobHistoryServer
=============== hadoop02 ===============
1632 ResourceManager
1744 NodeManager
1492 DataNode
2166 Jps
=============== hadoop03 ===============
1408 NodeManager
1601 Jps
1331 SecondaryNameNode
1263 DataNode

发布给其余两节点

[wangliukun@hadoop01 ~]$ xsync /home/wangliukun/bin/jpsall 
==================== hadoop01 ====================
sending incremental file list

sent 69 bytes  received 12 bytes  54.00 bytes/sec
total size is 160  speedup is 1.98
==================== hadoop02 ====================
sending incremental file list
jpsall

sent 276 bytes  received 35 bytes  207.33 bytes/sec
total size is 160  speedup is 0.51
==================== hadoop03 ====================
sending incremental file list
jpsall

sent 276 bytes  received 35 bytes  622.00 bytes/sec
total size is 160  speedup is 0.51
[wangliukun@hadoop01 ~]$