HDFS常用Shell命令总结

最新推荐文章于 2024-04-24 21:51:51 发布

Biubiubiu!!

最新推荐文章于 2024-04-24 21:51:51 发布

阅读量2.5k

点赞数 3

分类专栏：大数据BigData Shell 文章标签： hdfs常用shell命令 hdfsshell

本文链接：https://blog.csdn.net/qq_40246175/article/details/104304193

版权

大数据BigData 同时被 2 个专栏收录

18 篇文章 7 订阅

订阅专栏

Shell

2 篇文章 0 订阅

订阅专栏

觉得有帮助的，请多多支持博主，点赞关注哦~

文章目录

HDFS常用Shell命令

HDFS常用Shell命令

基本语法:

hdfs dfs 具体命令 ：使用面最广，可以操作任何文件系统。
hadoop fs 具体命令(过时) ：只能操作HDFS文件系统相关（包括与Local FS间的操作）

命令和Linux相似
-ls
-mkdir
-put
-rm
-help
...

一、普通shell处理命令

1、文件夹目录操作

1.1、查看目录

# 显示目录结构
hdfs dfs -ls <path>
# 以人性化的方式递归显示目录结构
hdfs dfs -ls  -R -h <path>
# 显示根目录下内容
hdfs dfs -ls /

# 查看HDFS目录“/tmp/{test}/hdfs_data”的内容。
hadoop fs -ls /tmp/{test}/hdfs_data

1.2、创建目录

# 创建目录
hdfs dfs -mkdir <path> 
# 递归创建目录
hdfs dfs -mkdir -p <path> 

# 在HDFS上创建目录“/tmp/{test}/hdfs_data”。
hadoop fs -mkdir -p /tmp/{test}/hdfs_data
# 一般在hdfs上都有的需要处理的数据目录
hdfs dfs -mkdir  /input 
# 一般在hdfs上都有的处理的结果数据目录
hdfs dfs -mkdir /output

1.3、删除目录

# 删除空文件夹
hdfs dfs -rmdir <path>
# 递归删除目录和文件
hdfs dfs -rm -r <path>

2、文件操作

2.1、查看文件信息

# 二选一执行即可
hdfs dfs -cat <path> 
#将HDFS中文件以文本形式输出（包括zip包，jar包等形式）
hdfs dfs -text <path> 

hdfs dfs -tail <path> 
#和Unix中tail -f命令类似，当文件内容更新时，输出将会改变，具有实时性
hdfs dfs -tail -f <path> 

#案例
[biubiubiu@hadoop01 ~]$ hdfs dfs -cat /wordcount/input/aaa.txt
[biubiubiu@hadoop01 ~]$ hdfs dfs -text /wordcount/input/aaa.txt
[biubiubiu@hadoop01 ~]$ hdfs dfs -tail  /input/hello.txt

2.2、修改文件的权限、所有者

# 权限控制和Linux上使用方式一致
# 变更文件或目录的所属群组。 用户必须是文件的所有者或超级用户。
hdfs dfs -chgrp [-R] GROUP URI [URI ...]
# 修改文件或目录的访问权限  用户必须是文件的所有者或超级用户。
hdfs dfs-chmod [-R] <MODE[,MODE]... | OCTALMODE> URI [URI ...]
# 修改文件的拥有者  用户必须是超级用户。
hdfs dfs -chown [-R] [OWNER][:[GROUP]] URI [URI ]
#案例
[biubiubiu@hadoop01 ~]$ hdfs dfs -chmod -R 774 /tmp
[biubiubiu@hadoop01 ~]$ hdfs dfs -chown -R biubiubiu:hadoopenv /tmp
[biubiubiu@hadoop01 ~]$ hdfs dfs -chgrp -R test /tmp

[biubiubiu@hadoop01 ~]$ hdfs dfs -chmod 777 /input/hello.txt
[biubiubiu@hadoop01 ~]$ hdfs dfs -chown 1111:1111  /input/hello.txt

2.3、统计文件信息

# 统计目录下各文件大小
hdfs dfs -du [-s] [-h] URI [URI ...]
-s : 显示所有文件大小总和
-h : 将以更友好的方式显示文件大小（例如 64.0m 而不是 67108864）

# 统计文件系统的可用空间
hdfs dfs -df -h /
-h : 将以更友好的方式显示文件大小（例如 64.0m 而不是 67108864）

#案例
[biubiubiu@hadoop01 ~]$ hdfs dfs -du /wordcount
55524  /wordcount/input
0      /wordcount/biubiubiu1
37     /wordcount/biubiubiu_mv
[biubiubiu@hadoop01 ~]$ hdfs dfs -du -h /wordcount
54.2 K  /wordcount/input
0       /wordcount/biubiubiu1
37      /wordcount/biubiubiu_mv
[biubiubiu@hadoop01 ~]$ hdfs dfs -du -h -s /wordcount
54.3 K  /wordcount

2.4、修改文件的副本数

#更改文件的复制因子。如果 path 是目录，则更改其下所有文件的复制因子
hdfs dfs -setrep [-w] <numReplicas> <path>
-w : 标志的请求，命令等待复制完成。这有可能需要很长的时间。

[biubiubiu@hadoop01 ~]$ hdfs dfs -setrep 2  /input/hello.txt
[biubiubiu@hadoop01 ~]$ hdfs dfs -setrep -w 5 /input/bbb.txt

2.5、删除文件

# 删除文件
hdfs dfs -rm <path>

[biubiubiu@hadoop01 ~]$ hdfs dfs -rm  /input/hello.txt

3、本地与集群的操作

3.1、将Linux本地的文件上传到集群（本地文件存在）

# 二选一执行即可
hdfs dfs -put <localsrc> <dst>
hdfs dfs -copyFromLocal <localsrc> <dst>
-f ：当文件存在时，进行覆盖
-p ：将权限、所属组、时间戳、ACL以及XATTR等也进行拷贝

#案例
[biubiubiu@hadoop01 ~]$ hdfs dfs -put ./hello.txt  /input
[biubiubiu@hadoop01 ~]$ hdfs dfs -copyFromLocal ./hi.txt /input
[biubiubiu@hadoop01 ~]$ hdfs dfs -put -f -p ~/bbb.txt /biubiubiu/bbb.txt

3.2、将Linux本地的文件剪切到集群（本地文件不存在）

hdfs dfs -moveFromLocal <localsrc> <dst>

#案例
[biubiubiu@hadoop01 ~]$ hdfs dfs -moveFromLocal ./hello.txt /input
[biubiubiu@hadoop01 ~]$ hdfs dfs -moveFromLocal ~/biubiubiu.txt /input

3.3、将Linux本地的文件追加到集群文件

集群文件不能随机修改，只能追加。
注意：这是大数据，数据量很大，跟传统的文件不同，不会修改一两条什么的，品，你细品。

hdfs dfs -appendToFile <localsrc> ... <dst>

#案例
[biubiubiu@hadoop01 ~]$ hdfs dfs -appendToFile ./test_1.txt /input/aaa.txt
# 多个文件用空格隔开
[biubiubiu@hadoop01 ~]$ hdfs dfs -appendToFile ./test_1.txt ./test_2.txt /input/aaa.txt

3.4、将集群文件下载到Linux本地

# 二选一执行即可
hdfs dfs -get <src> <localdst>
hdfs dfs -copyToLocal <src> <localdst>

#案例
[biubiubiu@hadoop01 ~]$ hdfs dfs -get /input/hello.txt  ./receive/
[biubiubiu@hadoop01 ~]$ hdfs dfs -copyToLocal /input/hi.txt  ./receive/
[biubiubiu@hadoop01 ~]$ hdfs dfs -get /input/biubiubiu.txt ./

3.5、合并下载多个文件到本地Linux

hdfs dfs -getmerge [-nl] <src> <localdst>

[biubiubiu@hadoop01 ~]$ hdfs dfs -getmerge /input/*  data.txt(合并再下载)

# 案例 将HDFS上的wordcount_input.txt和aaa.txt合并后下载到本地的当前用户家目录的merge.txt
[biubiubiu@hadoop01 ~]$ hdfs dfs -getmerge /input/wordcount_input.txt /wordcount/input/aaa.txt ~/merge.txt
-nl 在每个文件的末尾添加换行符（LineFeed）
-skip-empty-file 跳过空文件

4、集群内文件的操作

4.1、集群内文件的复制

#该命令允许多个来源，但此时目标必须是一个目录
hdfs dfs -cp [-f] [-p] <src> <dst>
-f ：当文件存在时，进行覆盖
-p ：将权限、所属组、时间戳、ACL以及XATTR等也进行拷贝

#案例
[biubiubiu@hadoop01 ~]$ hdfs dfs -cp /input/hello.txt /input/cptest
[biubiubiu@hadoop01 ~]$ hdfs dfs -cp -f -p /biubiubiu/bbb.txt /wordcount/input/bbb.txt

4.2、集群内文件的剪切（移动操作，重命名）

#命令允许多个来源，但此时目的地需要是一个目录。跨文件系统移动文件是不允许的
hdfs dfs -mv <src> <dst>

#案例
[biubiubiu@hadoop01 ~]$ hdfs dfs -mv /input/hi.txt /input/mvtest
# 剪切移动
[biubiubiu@hadoop01 ~]$ hdfs dfs -mv /biubiubiu /biubiubiu1 /wordcount
# 重名名
[biubiubiu@hadoop01 ~]$ hdfs dfs -mv /input/biubiubiu /input/biubiubiu_mv

5、文件检测

hdfs dfs -test -[defsz] URI
-d：如果路径是目录，返回 0
-e：如果路径存在，则返回 0
-f：如果路径是文件，则返回 0
-s：如果路径不为空，则返回 0
-r：如果路径存在且授予读权限，则返回 0
-w：如果路径存在且授予写入权限，则返回 0
-z：如果文件长度为零，则返回 0

#案例
[biubiubiu@hadoop01 ~]$ hdfs dfs -test -d /input/biubiubiu && echo "true"

6、查看DataNode存储的数据块信息位置

/opt/model/hadoop-2.7.7/dfs/datanode/current/BP-1209817470-192.168.159.151-1609489880182/current/finalized/subdir0/subdir0

二、hdfs dfsadmin命令

hdfs dfsadmin命令用于管理HDFS集群
在这里插入图片描述

1、返回集群的状态信息

[biubiubiu@hadoop01 ~]$ hdfs dfsadmin -report

2、安全模式的操作

什么样的情况namenode会进入安全模式：

集群启动前，自动进入安全模式，此时不能修改集群（上传、删除）
datanode死掉的数量达到了集群能接受的上限值，自动进入安全模式
datanode的可用空间达到了集群能接受的上限值，自动进入安全模式
人为的手动设置

[biubiubiu@hadoop01 ~]$ hdfs dfsadmin -safemode get  #查看状态
[biubiubiu@hadoop01 ~]$ hdfs dfsadmin -safemode enter  #进入安全模式（不可修改集群）
[biubiubiu@hadoop01 ~]$ hdfs dfsadmin -safemode leave  #离开安全模式(可正常操作)

3、保存系统镜像，重置操作日志

保存内存中的系统镜像(fsimage)为一个fsimage,重置操作日志(edits：记录了元数据整体的改变记录下来，每一个小时刷新一次)

注意：首先集群要进入安全模式（为什么？）

更新数据的时候，不能做修改操作了，就得进入安全模式。
安全模式会检测副本数是否符合标准，避免数据丢失。

[biubiubiu@hadoop01 ~]$ hdfs dfsadmin -saveNamespace

4、保存datanode和块信息存储到logs下

hdfs dfsadmin -metasave 名称.tt

# (默认存储到$HADOOP_HOME/logs文件下)
[biubiubiu@hadoop01 ~]$ hdfs dfsadmin -metasave  mydata.tt

5、重新读取hosts和exclude文件，刷新集群节点

节点的刷新（在集群动态的增删节点，主动读取hosts、exclude文件）

[biubiubiu@hadoop01 ~]$ hdfs dfsadmin -refreshNodes

6、设置文件目录的配额

注意：隐藏文件也会占配额

hdfs dfsadmin -setQuota  数量  目录

# 案例
[biubiubiu@hadoop01 ~]$ hdfs dfsadmin -setQuota 10 /input

三、hdfs fsck命令

hdfs fsck命令主要用于检查，查看文件或目录信息配置
在这里插入图片描述

hdfs fsck 目录  参数
参数：
 -files:检查所有文件的状态
 -blocks:检查块信息
 -locations: 检查块所在的服务器地址
 -racks:检查块所在的机架

# 案例
[biubiubiu@hadoop01 ~]$ hdfs fsck /input -files -blocks -locations -racks

# 结果
Connecting to namenode via http://biubiubiu:50070/fsck?ugi=hadooptest&files=1&blocks=1&locations=1&racks=1&path=%2Finput
FSCK started by hadooptest (auth:SIMPLE) from /192.168.153.131 for path /input at Thu Feb 13 23:39:11 CST 2020
/input <dir>
/input/hello.txt 35 bytes, 1 block(s):  Under replicated BP-1725421460-192.168.153.131-1581405231332:blk_1073741830_1007. Target Replicas is 2 but found 1 replica(s).
0. BP-1725421460-192.168.153.131-1581405231332:blk_1073741830_1007 len=35 repl=1 [/default-rack/192.168.153.131:50010]

/input/hi.txt 7 bytes, 1 block(s):  OK
0. BP-1725421460-192.168.153.131-1581405231332:blk_1073741829_1005 len=7 repl=1 [/default-rack/192.168.153.131:50010]

/input/localtest.txt 90 bytes, 1 block(s):  OK
0. BP-1725421460-192.168.153.131-1581405231332:blk_1073741832_1009 len=90 repl=1 [/default-rack/192.168.153.131:50010]

/input/quota <dir>
/input/quota/1 <dir>
/input/quota/2 <dir>
/input/vipuser <dir>
/input/vipuser/hello.txt 42 bytes, 1 block(s):  OK
0. BP-1725421460-192.168.153.131-1581405231332:blk_1073741828_1006 len=42 repl=1 [/default-rack/192.168.153.131:50010]

/input/vipuser/localtest.txt 90 bytes, 1 block(s):  OK
0. BP-1725421460-192.168.153.131-1581405231332:blk_1073741825_1001 len=90 repl=1 [/default-rack/192.168.153.131:50010]

/input/vipuser2 <dir>
/input/vipuser2/aaa <dir>
/input/vipuser2/aaa/hello.txt 35 bytes, 1 block(s):  OK
0. BP-1725421460-192.168.153.131-1581405231332:blk_1073741826_1002 len=35 repl=1 [/default-rack/192.168.153.131:50010]

/input/vipuser2/aaa/localtest.txt 90 bytes, 1 block(s):  OK
0. BP-1725421460-192.168.153.131-1581405231332:blk_1073741827_1003 len=90 repl=1 [/default-rack/192.168.153.131:50010]

/input/wordcount.txt 91 bytes, 1 block(s):  OK
0. BP-1725421460-192.168.153.131-1581405231332:blk_1073741853_1030 len=91 repl=1 [/default-rack/192.168.153.131:50010]

Status: HEALTHY
 Total size:	480 B
 Total dirs:	7
 Total files:	8
 Total symlinks:		0
 Total blocks (validated):	8 (avg. block size 60 B)
 Minimally replicated blocks:	8 (100.0 %)
 Over-replicated blocks:	0 (0.0 %)
 Under-replicated blocks:	1 (12.5 %)
 Mis-replicated blocks:		0 (0.0 %)
 Default replication factor:	1
 Average block replication:	1.0
 Corrupt blocks:		0
 Missing replicas:		1 (11.111111 %)
 Number of data-nodes:		1
 Number of racks:		1
FSCK ended at Thu Feb 13 23:39:11 CST 2020 in 9 milliseconds


The filesystem under path '/input' is HEALTHY