User Commands
概览
所有的HDFS命令都是执行bin/hdfs
脚本,当执行此脚本时,如果不带任何参数,就会打印使用说明usage。
Usage: hdfs [OPTIONS] SUBCOMMAND [SUBCOMMAND OPTIONS]
OPTIONS is none or any of:
--buildpaths attempt to add class files from build tree
--config dir Hadoop config directory
--daemon (start|status|stop) operate on a daemon
--debug turn on shell script debug mode
--help usage information
--hostnames list[,of,host,names] hosts to use in worker mode
--hosts filename list of hosts to use in worker mode
--loglevel level set the log4j level for this command
--workers turn on worker mode
SUBCOMMAND is one of:
Admin Commands:
cacheadmin configure the HDFS cache
crypto configure HDFS encryption zones
debug run a Debug Admin to execute HDFS debug commands
dfsadmin run a DFS admin client
dfsrouteradmin manage Router-based federation
ec run a HDFS ErasureCoding CLI
fsck run a DFS filesystem checking utility
haadmin run a DFS HA admin client
jmxget get JMX exported values from NameNode or DataNode.
oev apply the offline edits viewer to an edits file
oiv apply the offline fsimage viewer to an fsimage
oiv_legacy apply the offline fsimage viewer to a legacy fsimage
storagepolicies list/get/set/satisfyStoragePolicy block storage policies
Client Commands:
classpath prints the class path needed to get the hadoop jar and the required libraries
dfs run a filesystem command on the file system
envvars display computed Hadoop environment variables
fetchdt fetch a delegation token from the NameNode
getconf get config values from configuration
groups get the groups which users belong to
lsSnapshottableDir list all snapshottable dirs owned by the current user
snapshotDiff diff two snapshots of a directory or diff the current directory contents with a snapshot
version print the version
Daemon Commands:
balancer run a cluster balancing utility
datanode run a DFS datanode
dfsrouter run the DFS router
diskbalancer Distributes data evenly among disks on a given node
httpfs run HttpFS server, the HDFS HTTP Gateway
journalnode run the DFS journalnode
mover run a utility to move block replicas across storage types
namenode run the DFS namenode
nfs3 run an NFS version 3 gateway
portmap run a portmap service
secondarynamenode run the DFS secondary namenode
sps run external storagepolicysatisfier
zkfc run the ZK Failover Controller daemon
SUBCOMMAND may print help when invoked w/o parameters or with -h.
命令详解
classpath
该命令主要为打印Hadoop jar中所加载的类classpath路径
用法
COMMAND_OPTION | Description |
---|---|
--glob | 展开通配符,默认的是lib/*, 加上 glob 参数后,直接展示lib/下的所有jar文件 |
--jar path | 将classpath写入到mainfest中,生成一个jar包 |
-h, --help | 打印帮助 |
此命令调用的是 org.apache.hadoop.util.Classpath
类
bin/hdfs classpath
[root@hadoop-1 hadoop-3.3.1]# bin/hdfs classpath
/data/apps/hadoop-3.3.1/etc/hadoop:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/*:/data/apps/hadoop-3.3.1/share/hadoop/common/*:/data/apps/hadoop-3.3.1/share/hadoop/hdfs:/data/apps/hadoop-3.3.1/share/hadoop/hdfs/lib/*:/data/apps/hadoop-3.3.1/share/hadoop/hdfs/*:/data/apps/hadoop-3.3.1/share/hadoop/mapreduce/*:/data/apps/hadoop-3.3.1/share/hadoop/yarn:/data/apps/hadoop-3.3.1/share/hadoop/yarn/lib/*:/data/apps/hadoop-3.3.1/share/hadoop/yarn/*
bin/hdfs classpath --glob
[root@hadoop-1 hadoop-3.3.1]# bin/hdfs classpath --glob
/data/apps/hadoop-3.3.1/etc/hadoop:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/commons-io-2.8.0.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/commons-daemon-1.0.13.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/hadoop-annotations-3.3.1.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/kerb-server-1.0.1.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/jsr305-3.0.2.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/kerb-client-1.0.1.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/htrace-core4-4.1.0-incubating.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/asm-5.0.4.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/jetty-xml-9.4.40.v20210413.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/kerby-pkix-1.0.1.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/kerb-core-1.0.1.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/slf4j-log4j12-1.7.30.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/metrics-core-3.2.4.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/json-smart-2.4.2.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/jetty-io-9.4.40.v20210413.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/commons-logging-1.1.3.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/jetty-servlet-9.4.40.v20210413.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/paranamer-2.3.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/kerb-admin-1.0.1.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/kerb-crypto-1.0.1.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/log4j-1.2.17.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/jsp-api-2.1.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/curator-client-4.2.0.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/commons-compress-1.19.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/commons-collections-3.2.2.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/zookeeper-jute-3.5.6.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/jetty-server-9.4.40.v20210413.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/kerby-config-1.0.1.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/jackson-annotations-2.10.5.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/jackson-xc-1.9.13.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/jersey-core-1.19.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/jetty-util-ajax-9.4.40.v20210413.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/hadoop-shaded-protobuf_3_7-1.1.1.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/avro-1.7.7.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/commons-codec-1.11.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/jetty-util-9.4.40.v20210413.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/jettison-1.1.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/commons-text-1.4.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/kerb-identity-1.0.1.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/dnsjava-2.1.7.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/re2j-1.1.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/kerb-util-1.0.1.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/gson-2.2.4.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/jersey-json-1.19.jar:/data/apps/hadoop-3.3.1/share/hadoop/common/lib/audience-annotations-0.5.0.jar
envvars
用法:hdfs envvars
显示Hadoop的环境变量信息,是Hadoop 3.x版本的新命令
[root@hadoop-1 hadoop-3.3.1]# bin/hdfs envvars
JAVA_HOME='/data/jdk1.8.0_311'
HADOOP_HDFS_HOME='/data/apps/hadoop-3.3.1'
HDFS_DIR='share/hadoop/hdfs'
HDFS_LIB_JARS_DIR='share/hadoop/hdfs/lib'
HADOOP_CONF_DIR='/data/apps/hadoop-3.3.1/etc/hadoop'
HADOOP_TOOLS_HOME='/data/apps/hadoop-3.3.1'
HADOOP_TOOLS_DIR='share/hadoop/tools'
HADOOP_TOOLS_LIB_JARS_DIR='share/hadoop/tools/lib'
fetchdt
用法: hdfs fetchdt
此命令调用的是 org.apache.hadoop.hdfs.tools.DelegationTokenFetcher
类
COMMAND_OPTION | Description |
---|---|
--webservice NN_URL | 与NN联系的Url(以http或https开头) |
--renewer name | 委托令牌的名称 |
--cancel | 取消委托令牌 |
--renew | 更新委托令牌。必须使用- renew name选项获取委托令牌。 |
--print | Print the delegation token |
token_file_path | 委托令牌存储位置 |
fsck
此命令非常重要,常常用于修复namenode元数据,查看块信息,获取块的位置属性。
此命令调用的是 org.apache.hadoop.hdfs.tools.DFSck
类
用法
hdfs fsck <path>
[-list-corruptfileblocks |
[-move | -delete | -openforwrite]
[-files [-blocks [-locations | -racks | -replicaDetails | -upgradedomains]]]
[-includeSnapshots] [-showprogress]
[-storagepolicies] [-maintenance]
[-blockId <blk_Id>]
COMMAND_OPTION | Description |
---|---|
path | 文件块检查开始路径 |
-delete | 删除检查完毕的块 |
-files | 打印已经检查的文件信息 |
-files -blocks | 打印块报告 |
-files -blocks -locations | 打印数据块存储位置 |
-files -blocks -racks | 打印数据节点位置的网络拓扑 |
-files -blocks -replicaDetails | 打印每个数据块复本的详情 |
-files -blocks -upgradedomains | 打印正在升级中的块信息 |
-includeSnapshots | 如果给定路径指示snapshottable目录或其下有snapshottable目录,则包括快照数据 |
-list-corruptfileblocks | 打印出丢失的块及其所属文件的列表 |
-move | 将损坏的文件移到 /lost+found目录 |
-openforwrite | 打印正要写入的文件 |
-showprogress | 打印输出进度。默认关闭 |
-storagepolicies | 打印块的存储策略摘要 |
-maintenance | 打印维护状态节点详细信息 |
-blockId | 打印块的基本信息 |
查看 路径 位置信息
[root@hadoop-1 hadoop-3.3.1]# bin/hdfs fsck / -files -blocks -locations
Connecting to namenode via http://192.168.1.1:9870/fsck?ugi=root&files=1&path=%2F
FSCK started by root (auth:SIMPLE) from /192.168.1.1 for path / at Thu Mar 09 21:19:43 CST 2023
/ <dir>
/history <dir>
/history/done_intermediate <dir>
/history/done_intermediate/root <dir>
/history/done_intermediate/root/job_1676356354068_0001-1676361683509-root-QuasiMonteCarlo-1676361715080-5-1-SUCCEEDED-default-1676361695608.jhist 37479 bytes, replicated: replication=2, 1 block(s): OK
0. BP-1397494263-192.168.1.1-1676343701890:blk_1073741856_1032 len=37479 Live_repl=2 [DatanodeInfoWithStorage[192.168.1.2:9866,DS-2103ba93-bc7c-45a4-b55d-841907750a90,DISK], DatanodeInfoWithStorage[192.168.1.3:9866,DS-ab85c804-d5b8-45be-b40f-0374ac15f323,DISK]]
/history/done_intermediate/root/job_1676356354068_0001.summary 445 bytes, replicated: replication=2, 1 block(s): OK
1. BP-1397494263-192.168.1.1-1676343701890:blk_1073741855_1031 len=445 Live_repl=2 [DatanodeInfoWithStorage[192.168.1.3:9866,DS-ab85c804-d5b8-45be-b40f-0374ac15f323,DISK], DatanodeInfoWithStorage[192.168.1.2:9866,DS-2103ba93-bc7c-45a4-b55d-841907750a90,DISK]]
/history/done_intermediate/root/job_1676356354068_0001_conf.xml 278110 bytes, replicated: replication=2, 1 block(s): OK
2. BP-1397494263-192.168.1.1-1676343701890:blk_1073741857_1033 len=278110 Live_repl=2 [DatanodeInfoWithStorage[192.168.1.3:9866,DS-ab85c804-d5b8-45be-b40f-0374ac15f323,DISK], DatanodeInfoWithStorage[192.168.1.2:9866,DS-2103ba93-bc7c-45a4-b55d-841907750a90,DISK]]
......
Status: HEALTHY
Number of data-nodes: 2
Number of racks: 1
Total dirs: 132
Total symlinks: 0
Replicated Blocks:
Total size: 732857783 B
Total files: 197
Total blocks (validated): 198 (avg. block size 3701301 B)
Minimally replicated blocks: 198 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 2
Average block replication: 2.0
Missing blocks: 0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Blocks queued for replication: 0
Erasure Coded Block Groups:
Total size: 0 B
Total files: 0
Total block groups (validated): 0
Minimally erasure-coded block groups: 0
Over-erasure-coded block groups: 0
Under-erasure-coded block groups: 0
Unsatisfactory placement block groups: 0
Average block group size: 0.0
Missing block groups: 0
Corrupt block groups: 0
Missing internal blocks: 0
Blocks queued for replication: 0
FSCK ended at Thu Mar 09 21:19:43 CST 2023 in 38 milliseconds
The filesystem under path '/' is HEALTHY
查看 错误 文件
[root@hadoop-1 hadoop-3.3.1]# bin/hdfs fsck / -list-corruptfileblocks
Connecting to namenode via http://192.168.1.1:9870/fsck?ugi=root&listcorruptfileblocks=1&path=%2F
The filesystem under path '/' has 0 CORRUPT files
getconf
获取集群的信息
此命令调用的是 org.apache.hadoop.hdfs.tools.GetConf
类
用法
hdfs getconf -namenodes # 获取namenodes节点列表
hdfs getconf -secondaryNameNodes # 获取secondaryNameNodes节点列表
hdfs getconf -backupNodes # 获取backupNodes节点列表
hdfs getconf -journalNodes # 获取journalNodes节点列表
hdfs getconf -includeFile # 获取定义数据节点白名单(允许接入集群的数据节点)的文件路径
hdfs getconf -excludeFile # 获取定义数据节点黑名单(停用的数据节点)的文件路径
hdfs getconf -nnRpcAddresses # 获取namenode rpc地址
hdfs getconf -confKey [key] # 获取集群指定key的配置信息
使用案例
[root@hadoop-1 hadoop-3.3.1]# bin/hdfs getconf -namenodes
hadoop-1 hadoop-3
[root@hadoop-1 hadoop-3.3.1]# bin/hdfs getconf -journalnodes
hadoop-1 hadoop-2 hadoop-3
[root@hadoop-1 hadoop-3.3.1]# bin/hdfs getconf -confKey 'fs.defaultFS'
hdfs://cdp-cluster
groups
用法: hdfs groups [username ...]
此命令调用的是 org.apache.hadoop.hdfs.tools.GetGroups
类
返回给定一个或多个用户名的组信息
httpfs
用法:hdfs httpfs
运行 HttpFS 服务器,即HDFS HTTP网关。
lsSnapshottableDir
用法: hdfs lsSnapshottableDir [-help]
此命令调用的是 org.apache.hadoop.hdfs.tools.snapshot.LsSnapshottableDir
类
获取快照表目录列表。当它作为超级用户运行时,它返回所有快照表目录。否则,它将返回当前用户拥有的目录。
jmxget
用法:hdfs jmxget [-localVM ConnectorURL | -port port | -server mbeanserver | -service service]
此命令调用的是 org.apache.hadoop.hdfs.tools.JMXGet
类
COMMAND_OPTION | Description |
---|---|
-help | 打印使用说明 |
-localVM ConnectorURL | 连接到同一台计算机上的虚拟机 |
-port mbean server port | 指定mbean服务器端口,如果缺少,它将尝试连接到同一个虚拟机中的mbean服务器 |
-server | 指定mbean服务器(默认情况下为本地主机) |
-service NameNode | 指定jmx服务。默认情况下为NameNode。 |
从一个服务中转储JMX信息
oev(offline edits viewer)
用法:hdfs oev [OPTIONS] -i INPUT_FILE -o OUTPUT_FILE
此命令调用的是 org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer
类
必要的参数:
COMMAND_OPTION | Description |
---|---|
-i,--inputFile arg | 要处理editsLog文件,xml(不区分大小写)扩展名表示xml格式,任何其他文件名表示二进制格式 |
-o,--outputFile arg | 输出文件的名称。如果指定的文件存在,它将被覆盖,文件的格式由-p选项决定 |
可选的参数:
COMMAND_OPTION | Description |
---|---|
-f,--fix-txids | 重新对输入的事务ID编号,用于消除间隙或无效的事务ID |
-h,--help | Display usage information and exit |
-r,--recover | 读取二进制编辑日志时,请使用恢复模式。这将会跳过编辑日志中损坏的部分 |
-p,--processor arg | 选择要应用于图像文件的处理器类型, 当前支持的处理器类型: binary (Hadoop使用的本机二进制格式), xml (默认,XML格式), stats (打印编辑日志的统计信息) |
-v,--verbose | 更详细的输出,打印输入和输出文件名,对于写入文件的处理器,也输出到屏幕。对于大型fsimage文件,这将显著增加处理时间(默认值为false) |
使用案例
# hdfs oev -p 文件类型 -i编辑日志 -o 转换后文件输出路径
hdfs oev -p XML -i edits_0000000000000000012-0000000000000000013 -o /data/apps/hadoop-3.3.1/edits.xml
oiv(Offline Image Viewe)
用法:hdfs oiv [OPTIONS] -i INPUT_FILE
此命令调用的是 org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewerPB
类
必要的参数:
COMMAND_OPTION | Description | |
---|---|---|
-i | –inputFile input file | 指定要处理的fsimage文件(或XML文件,如果已经使用ReverseXML处理器) |
可选的参数:
COMMAND_OPTION | Description |
---|---|
-o,--outputFile output file | |
-p,--processor processor | |
-addr address | |
-maxSize size | |
-step size | |
-format | |
-delimiter arg | |
-t,--temp temporary dir | |
-h,--help |
使用案例
# hdfs oiv -p 文件类型 -i镜像文件 -o 转换后文件输出路径
hdfs oiv -p XML -i fsimage_0000000000000000025 -o /data/apps/hadoop-3.3.1/fsimage.xml
oiv_legacy
snapshotDiff
用法:hdfs snapshotDiff
此命令调用的是 org.apache.hadoop.hdfs.tools.snapshot.SnapshotDiff
类
比较HDFS快照之间的差异。
version
用法:hdfs version
此命令调用的是 org.apache.hadoop.util.VersionInfo
类
打印hdfs版本号。
[root@hadoop-1 hadoop-3.3.1]# bin/hdfs version
Hadoop 3.3.1
Source code repository https://github.com/apache/hadoop.git -r a3b9c37a397ad4188041dd80621bdeefc46885f2
Compiled by ubuntu on 2021-06-15T10:51Z
Compiled with protoc 3.7.1
From source with checksum 88a4ddb2299aca054416d6b7f81ca55
This command was run using /data/apps/hadoop-3.3.1/share/hadoop/common/hadoop-common-3.3.1.jar