Hadoop集群基础操作

Hadoop集群基础操作

Hadoop集群基本信息查看

集群存储信息查看

登录HDFS监控web查看运行情况及相关存储信息,默认端口为50070,具体以hdfs-site.xml文件中配置为准

<!-- 定义namenode界面的访问地址 -->
<property>
	<name>dfs.http.address</name>
	<value>node1:50070</value>
</property>

image-20220512104801237

当然也可以在后台服务器通过命令的方式进行查看:

Usage: hdfs dfsadmin
Note: Administrative commands can only be run as the HDFS superuser.
        [-report [-live] [-dead] [-decommissioning] [-enteringmaintenance] [-inmaintenance]]

集群计算资源查看

登录8088端口(默认)查看集群的计算资源信息,具体地址以yarn-site.xml文件中配置为准:

<!-- 指定YARN的ResourceManager的地址 -->
<property>
	<name>yarn.resourcemanager.hostname</name>
	<value>node1</value>
</property>
<!-- yarn的web访问地址 -->
<property>
	<description>
		The http address of the RM web application.
		If only a host is provided as the value,
		the webapp will be served on a random port.
	</description>
	<name>yarn.resourcemanager.webapp.address</name>
	<value>${yarn.resourcemanager.hostname}:8088</value>
</property>

image-20220512112122750

进入8042端口(默认)查看节点的各项资源信息,具体以yarn-site.xml文件中配置为准:

<property>
	<description>NM Webapp address.</description>
    <name>yarn.nodemanager.webapp.address</name>
    <value>${yarn.nodemanager.hostname}:8042</value>
</property>

image-20220512112933576

HDFS文件系统操作

查看HDFS文件系统

可以登录50070端口通过web浏览hdfs文件系统基本信息,和正常在Linux操作系统目录结构基本一样:

image-20220512113458371

HDFS基本操作

通过HDFS命令可以完成对HDFS文件系统的大部分管理操作,相关命令信息如下:

[hadoop@node1 ~]$ hdfs dfs
Usage: hadoop fs [generic options]
        [-appendToFile <localsrc> ... <dst>]
        [-cat [-ignoreCrc] <src> ...]
        [-checksum <src> ...]
        [-chgrp [-R] GROUP PATH...]
        [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
        [-chown [-R] [OWNER][:[GROUP]] PATH...]
        [-copyFromLocal [-f] [-p] [-l] [-d] <localsrc> ... <dst>]
        [-copyToLocal [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
        [-count [-q] [-h] [-v] [-t [<storage type>]] [-u] [-x] <path> ...]
        [-cp [-f] [-p | -p[topax]] [-d] <src> ... <dst>]
        [-createSnapshot <snapshotDir> [<snapshotName>]]
        [-deleteSnapshot <snapshotDir> <snapshotName>]
        [-df [-h] [<path> ...]]
        [-du [-s] [-h] [-x] <path> ...]
        [-expunge]
        [-find <path> ... <expression> ...]
        [-get [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
        [-getfacl [-R] <path>]
        [-getfattr [-R] {-n name | -d} [-e en] <path>]
        [-getmerge [-nl] [-skip-empty-file] <src> <localdst>]
        [-help [cmd ...]]
        [-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [<path> ...]]
        [-mkdir [-p] <path> ...]
        [-moveFromLocal <localsrc> ... <dst>]
        [-moveToLocal <src> <localdst>]
        [-mv <src> ... <dst>]
        [-put [-f] [-p] [-l] [-d] <localsrc> ... <dst>]
        [-renameSnapshot <snapshotDir> <oldName> <newName>]
        [-rm [-f] [-r|-R] [-skipTrash] [-safely] <src> ...]
        [-rmdir [--ignore-fail-on-non-empty] <dir> ...]
        [-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
        [-setfattr {-n name [-v value] | -x name} <path>]
        [-setrep [-R] [-w] <rep> <path> ...]
        [-stat [format] <path> ...]
        [-tail [-f] <file>]
        [-test -[defsz] <path>]
        [-text [-ignoreCrc] <src> ...]
        [-touchz <path> ...]
        [-truncate [-w] <length> <path> ...]
        [-usage [cmd ...]]

Generic options supported are:
-conf <configuration file>        specify an application configuration file
-D <property=value>               define a value for a given property
-fs <file:///|hdfs://namenode:port> specify default filesystem URL to use, overrides 'fs.defaultFS' property from configurations.
-jt <local|resourcemanager:port>  specify a ResourceManager
-files <file1,...>                specify a comma-separated list of files to be copied to the map reduce cluster
-libjars <jar1,...>               specify a comma-separated list of jar files to be included in the classpath
-archives <archive1,...>          specify a comma-separated list of archives to be unarchived on the compute machines

The general command line syntax is:
command [genericOptions] [commandOptions]

常用操作命令:

# 创建目录
hdfs dfs -mkdir /tmp # 在HDFS的根目录(/)下创建一个tmp目录
hdfs dfs -mkdir -p /test1/test2 # 创建多级目录,加入参数“-p”

# 显示文件相关信息
hdfs dfs -ls / # 列出HDFS上的所有目录

# 显示文件内容
hdfs dfs -cat /tmp/test.txt # 显示文件内容

# 文件上传到HDFS中
hdfs dfs -put /home/hadoop/test.txt /tmp # 将本地/home/hadoop/test.txt文件上传到HDFS中的/tmp目录下
hdfs dfs -appendToFile /home/hadooop/test.txt /tmp # 若文件存在,则追加到文件末尾
hdfs dfs -copyFromLocal /home/hadoop/test.txt /tmp # 若HDFS文件已存在,则覆盖原有文件

# 下载HDFS中的文件
hdfs dfs -get /tmp/test.txt /home/hadoop #HDFS中的文件test.txt下载到本地的/home/hadoop目录下
hdfs dfs -copyToLocal /tmp/test.txt /home/hadoop/test1.txt #若本地存在该文件,对文件重命名

# 在HDFS中移动文件
hdfs dfs -mv /tmp/test.txt /test/ # 将test.txt移动到test目录下

# 删除HDFS中的指定文件
hdfs dfs -rm /tmp/test.txt

在web页面也可查看文件的相关信息:

image-20220512140332957

运行MapReduce任务

官方示例程序包

$HADOOOP_HOME/share/hadoop/mapreduce/目录下有个官方示例程序包hadoop-mapreduce-examples-2.10.1.jar,其中封装了一些常用的测试模块:

程序名称用途
aggregatewordcount一个基于聚合的map/reduce程序,它对输入文件中的单词进行计数。
aggregatewordhist一个基于聚合的map/reduce程序,用于计算输入文件中单词的直方图。
bbp一个使用Bailey Borwein Plouffe计算PI精确数字的map/reduce程序。
dbcount一个计算页面浏览量的示例作业,从数据库中计数。
distbbp一个使用BBP型公式计算PI精确比特的map/reduce程序。
grep一个在输入中计算正则表达式匹配的map/reduce程序。
join一个影响连接排序、相等分区数据集的作业
multifilewc一个从多个文件中计算单词的任务。
pentomino一个地图/减少瓦片铺设程序来找到解决PotoMimo问题的方法。
pi一个用拟蒙特卡洛方法估计PI的MAP/Relp程序。
randomtextwriter一个map/reduce程序,每个节点写入10GB的随机文本数据。
randomwriter一个映射/RADIUS程序,每个节点写入10GB的随机数据。
secondarysort定义一个次要排序到减少的例子。
sort一个对随机写入器写入的数据进行排序的map/reduce程序。
sudoku数独求解者。
teragen为terasort生成数据
terasort运行terasort
teravalidateterasort的检查结果
wordcount一个映射/缩小程序,计算输入文件中的单词。
wordmeanmap/reduce程序,用于计算输入文件中单词的平均长度。
wordmedianmap/reduce程序,用于计算输入文件中单词的中值长度。

提交MapReduce任务运行

示例1:wordcount

执行命令及日志信息如下:

[hadoop@node1 ~]$ hadoop jar /app/hadoop-2.10.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.10.1.jar wordcount /tmp/test.txt /tmp/output
22/05/11 23:19:17 INFO client.RMProxy: Connecting to ResourceManager at node1/199.188.166.111:8032
22/05/11 23:19:19 INFO input.FileInputFormat: Total input files to process : 1
22/05/11 23:19:19 INFO mapreduce.JobSubmitter: number of splits:1
22/05/11 23:19:20 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1652322858586_0001
22/05/11 23:19:20 INFO conf.Configuration: resource-types.xml not found
22/05/11 23:19:20 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
22/05/11 23:19:20 INFO resource.ResourceUtils: Adding resource type - name = memory-mb, units = Mi, type = COUNTABLE
22/05/11 23:19:20 INFO resource.ResourceUtils: Adding resource type - name = vcores, units = , type = COUNTABLE
22/05/11 23:19:21 INFO impl.YarnClientImpl: Submitted application application_1652322858586_0001
22/05/11 23:19:21 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1652322858586_0001/
22/05/11 23:19:21 INFO mapreduce.Job: Running job: job_1652322858586_0001
22/05/11 23:19:38 INFO mapreduce.Job: Job job_1652322858586_0001 running in uber mode : false
22/05/11 23:19:38 INFO mapreduce.Job:  map 0% reduce 0%
22/05/11 23:19:50 INFO mapreduce.Job:  map 100% reduce 0%
22/05/11 23:20:02 INFO mapreduce.Job:  map 100% reduce 100%
22/05/11 23:20:03 INFO mapreduce.Job: Job job_1652322858586_0001 completed successfully
22/05/11 23:20:03 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=2274
                FILE: Number of bytes written=421473
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=3213
                HDFS: Number of bytes written=1928
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=8053
                Total time spent by all reduces in occupied slots (ms)=8758
                Total time spent by all map tasks (ms)=8053
                Total time spent by all reduce tasks (ms)=8758
                Total vcore-milliseconds taken by all map tasks=8053
                Total vcore-milliseconds taken by all reduce tasks=8758
                Total megabyte-milliseconds taken by all map tasks=8246272
                Total megabyte-milliseconds taken by all reduce tasks=8968192
        Map-Reduce Framework
                Map input records=38
                Map output records=335
                Map output bytes=4379
                Map output materialized bytes=2274
                Input split bytes=95
                Combine input records=335
                Combine output records=87
                Reduce input groups=87
                Reduce shuffle bytes=2274
                Reduce input records=87
                Reduce output records=87
                Spilled Records=174
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=222
                CPU time spent (ms)=2630
                Physical memory (bytes) snapshot=396931072
                Virtual memory (bytes) snapshot=3804737536
                Total committed heap usage (bytes)=194383872
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=3118
        File Output Format Counters
                Bytes Written=1928

执行完成后可在HDFS文件系统中查看到执行结果:

image-20220512142722893

image-20220512142955562

在output文件中生成了两个新文件:一个是_SUCCESS,这是一个标识文件,表示这个任务执行完成;另一个是part-r-00000,即任务执行完成后产生的结果文件:

[hadoop@node1 ~]$ hdfs dfs -cat /tmp/output/part-r-00000
-rw-rw-r--.     4
00:15   1
00:20   1
00:26   1
00:29   1
00:39   1
00:47   1
00:51   1
00:54   1
01:00   1
01:04   1
01:07   1
01:09   1
01:16   1
07:55   1
08:04   1
08:15   1
09:32   1
1       4
11      9
16      1
17      5
18:59   6
19:33   4
19:34   4
2       8
20:24   1
20:33   1
20:46   1
21      18
23:05   1
23:06   2
28      1
29      1
3       23
30      2
35      2
4       7
5       17
54      1
6       7
9       6
Apr     4
Jetty_0_0_0_0_50070_hdfs____w2cu08      1
Jetty_0_0_0_0_50090_secondary____y6aanv 1
Jetty_0_0_0_0_8042_node____19tj0x       1
Jetty_localhost_32873_datanode____t7p7lo        1
Jetty_localhost_33735_datanode____jksu74        1
Jetty_localhost_34961_datanode____.fpendy       1
Jetty_localhost_36015_datanode____.lhrbt4       1
Jetty_localhost_38151_datanode____.rhd829       1
Jetty_localhost_39677_datanode____.s4r2y1       1
Jetty_localhost_40461_datanode____.d6iqau       1
Jetty_localhost_40969_datanode____1moe5j        1
Jetty_localhost_41457_datanode____snit9c        1
Jetty_localhost_42109_datanode____.mhhtgd       1
Jetty_localhost_42315_datanode____.wlr1a8       1
Jetty_localhost_42845_datanode____.422dr2       1
Jetty_localhost_43529_datanode____.iybvi4       1
Jetty_localhost_43811_datanode____vzpazk        1
Jetty_localhost_44775_datanode____2kxto 1
Jetty_node1_50070_hdfs____.8fa0c        1
Jetty_node1_8088_cluster____uqk9cr      1
May     33
drwx------.     11
drwxr-xr-x.     2
drwxrwxr-x.     20
hadoop  52
hadoop-hadoop-datanode.pid      1
hadoop-hadoop-namenode.pid      1
hsperfdata_hadoop       1
hsperfdata_root 1
root    22
systemd-private-0abe12489c264785bd8088f6e33eeb83-ModemManager.service-aaM0Jf    1
systemd-private-0abe12489c264785bd8088f6e33eeb83-bluetooth.service-7X05Qi       1
systemd-private-0abe12489c264785bd8088f6e33eeb83-chronyd.service-HmOY5i 1
systemd-private-0abe12489c264785bd8088f6e33eeb83-colord.service-VyCTLg  1
systemd-private-0abe12489c264785bd8088f6e33eeb83-rtkit-daemon.service-3U58pj    1
total   1
tracker-extract-files.1000      1
vmware-root_916-2689078442      1
vmware-root_918-2697532712      1
vmware-root_921-3980298495      1
vmware-root_925-3988621690      1
vmware-root_927-3980167416      1
yarn-hadoop-nodemanager.pid     1
yarn-hadoop-resourcemanager.pid 1
示例2:计算圆周率Π的值

执行命令及日志信息如下:

[hadoop@node1 ~]$ hadoop jar /app/hadoop-2.10.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.10.1.jar pi 10 100
Number of Maps  = 10
Samples per Map = 100
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
22/05/11 23:35:11 INFO client.RMProxy: Connecting to ResourceManager at node1/199.188.166.111:8032
22/05/11 23:35:12 INFO input.FileInputFormat: Total input files to process : 10
22/05/11 23:35:12 INFO mapreduce.JobSubmitter: number of splits:10
22/05/11 23:35:13 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1652322858586_0002
22/05/11 23:35:13 INFO conf.Configuration: resource-types.xml not found
22/05/11 23:35:13 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
22/05/11 23:35:13 INFO resource.ResourceUtils: Adding resource type - name = memory-mb, units = Mi, type = COUNTABLE
22/05/11 23:35:13 INFO resource.ResourceUtils: Adding resource type - name = vcores, units = , type = COUNTABLE
22/05/11 23:35:13 INFO impl.YarnClientImpl: Submitted application application_1652322858586_0002
22/05/11 23:35:13 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1652322858586_0002/
22/05/11 23:35:13 INFO mapreduce.Job: Running job: job_1652322858586_0002
22/05/11 23:35:24 INFO mapreduce.Job: Job job_1652322858586_0002 running in uber mode : false
22/05/11 23:35:24 INFO mapreduce.Job:  map 0% reduce 0%
22/05/11 23:35:42 INFO mapreduce.Job:  map 20% reduce 0%
22/05/11 23:36:00 INFO mapreduce.Job:  map 20% reduce 7%
22/05/11 23:36:28 INFO mapreduce.Job:  map 30% reduce 7%
22/05/11 23:36:29 INFO mapreduce.Job:  map 50% reduce 7%
22/05/11 23:36:30 INFO mapreduce.Job:  map 70% reduce 7%
22/05/11 23:36:31 INFO mapreduce.Job:  map 100% reduce 7%
22/05/11 23:36:32 INFO mapreduce.Job:  map 100% reduce 100%
22/05/11 23:36:33 INFO mapreduce.Job: Job job_1652322858586_0002 completed successfully
22/05/11 23:36:33 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=226
                FILE: Number of bytes written=2297625
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=2620
                HDFS: Number of bytes written=215
                HDFS: Number of read operations=43
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=3
        Job Counters
                Launched map tasks=10
                Launched reduce tasks=1
                Data-local map tasks=10
                Total time spent by all maps in occupied slots (ms)=528176
                Total time spent by all reduces in occupied slots (ms)=48013
                Total time spent by all map tasks (ms)=528176
                Total time spent by all reduce tasks (ms)=48013
                Total vcore-milliseconds taken by all map tasks=528176
                Total vcore-milliseconds taken by all reduce tasks=48013
                Total megabyte-milliseconds taken by all map tasks=540852224
                Total megabyte-milliseconds taken by all reduce tasks=49165312
        Map-Reduce Framework
                Map input records=10
                Map output records=20
                Map output bytes=180
                Map output materialized bytes=280
                Input split bytes=1440
                Combine input records=0
                Combine output records=0
                Reduce input groups=2
                Reduce shuffle bytes=280
                Reduce input records=20
                Reduce output records=0
                Spilled Records=40
                Shuffled Maps =10
                Failed Shuffles=0
                Merged Map outputs=10
                GC time elapsed (ms)=9894
                CPU time spent (ms)=13350
                Physical memory (bytes) snapshot=2232963072
                Virtual memory (bytes) snapshot=20908384256
                Total committed heap usage (bytes)=1540988928
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=1180
        File Output Format Counters
                Bytes Written=97
Job Finished in 82.602 seconds
Estimated value of Pi is 3.14800000000000000000

查看MapReduce任务计算资源情况

  • 在下面页面可以实时看到集群资源的使用情况(因为执行完成,所以参数为初始参数)

image-20220512144327517

  • MapReduce任务列表

image-20220512144447933

  • 查看任务的详细信息

image-20220512144548355

  • 1
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值