HDFS是什么
hdfs是hadoop项目的核心子项目,是一个分布式存储的文件系统。具有以下特点:
- 高容错性。hdfs自动创建多个副本。当某一个副本丢失hdfs会复制其他机器上的副本
- 适合大数据处理,能够处理GB,TB,PB级别的数据
- 基于硬盘迭代的IO。一旦写入就不能修改。
- 可以装在廉价的机器上
HDFS的常用命令
version
version可以用来查看版本
[hadoop@hadoop01 bin]$ hadoop version
Hadoop 3.2.1
Source code repository https://gitbox.apache.org/repos/asf/hadoop.git -r b3cbbb467e22ea829b3808f4b7b01d07e0bf3842
Compiled by rohithsharmaks on 2019-09-10T15:56Z
Compiled with protoc 2.5.0
From source with checksum 776eaf9eee9c0ffc370bcbc1888737
This command was run using /home/hadoop/BD/hadoop-3.2.1/share/hadoop/common/hadoop-common-3.2.1.jar
dfsadmin
dfsadmin 命令可以查看集群存储空间使用情况
[hadoop@hadoop01 bin]$ hadoop dfsadmin -report
WARNING: Use of this script to execute dfsadmin is deprecated.
WARNING: Attempting to execute replacement "hdfs dfsadmin" instead.
Configured Capacity: 115360407552 (107.44 GB)
Present Capacity: 72069640192 (67.12 GB)
DFS Remaining: 70385803264 (65.55 GB)
DFS Used: 1683836928 (1.57 GB)
DFS Used%: 2.34%
Replicated Blocks:
Under replicated blocks: 8
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Low redundancy blocks with highest priority to recover: 0
Pending deletion blocks: 0
Erasure Coded Block Groups:
Low redundancy block groups: 0
Block groups with corrupt internal blocks: 0
Missing block groups: 0
Low redundancy blocks with highest priority to recover: 0
Pending deletion blocks: 0
-------------------------------------------------
Live datanodes (3):
Name: 192.168.229.201:9866 (hadoop03)
Hostname: hadoop03
Decommission Status : Normal
Configured Capacity: 38453469184 (35.81 GB)
DFS Used: 561278976 (535.28 MB)
Non DFS Used: 14261157888 (13.28 GB)
DFS Remaining: 23631032320 (22.01 GB)
DFS Used%: 1.46%
DFS Remaining%: 61.45%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Dec 25 01:22:47 PST 2020
Last Block Report: Thu Dec 24 23:20:39 PST 2020
Num of Blocks: 49
Name: 192.168.229.202:9866 (hadoop04)
Hostname: hadoop04
Decommission Status : Normal
Configured Capacity: 38453469184 (35.81 GB)
DFS Used: 561278976 (535.28 MB)
Non DFS Used: 12875857920 (11.99 GB)
DFS Remaining: 25016332288 (23.30 GB)
DFS Used%: 1.46%
DFS Remaining%: 65.06%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Dec 25 01:22:47 PST 2020
Last Block Report: Thu Dec 24 23:20:39 PST 2020
Num of Blocks: 49
Name: 192.168.229.203:9866 (hadoop05)
Hostname: hadoop05
Decommission Status : Normal
Configured Capacity: 38453469184 (35.81 GB)
DFS Used: 561278976 (535.28 MB)
Non DFS Used: 16153751552 (15.04 GB)
DFS Remaining: 21738438656 (20.25 GB)
DFS Used%: 1.46%
DFS Remaining%: 56.53%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Dec 25 01:22:48 PST 2020
Last Block Report: Thu Dec 24 23:20:39 PST 2020
Num of Blocks: 49
jar
jar 命令可以用来运行包含MapReduce程序的jar包
hadoop jar WordCount.jar
fs
fs命令可以操作hdfs上的文件。该命令有很多子命令。
- cat可以查看指定文件或指定文件夹下的所有文件
hadoop fs -cat
2. copyFromLocal功能和put类似,将本地文件上传到hdfs
3. copyToLocal和get类似,将hdfs上的文件下载到本地
hadoop fs -put ./b.txt / #如果不写目标位置会默认上传到/user/{系统当前用户}/ 目录下
hadoop fs -copyFromLocal ./b.txt /
hadoop fs -get /b.txt
hadoop fs -copyToLocal /b.txt
- cp从hdfs的一个位置复制到hdfs的另外一个位置
- du查看文件或文件夹的属性信息
hadoop fs -du /
可以看到这里有三列,第一列和第二列的数字大小大概是1:3的关系。第一列表示的是文件的大小,第二列表示的是占用hdfs空间的大小,和备份数量有关系。因为设置的备份数为3,所以比例是1:3。设置备份数量可以在hdfs-site.xml中修改下面配置
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
另外,此命令后面还可以接-h
参数。hadoop fs -du -h /
6. expunge 清空回收站
hadoop fs -expunge
hadoop的回收站功能默认是不开启的,可以通过core-site.xml
配置
<!--配置回收站,单位分钟 -->
<property>
<name>fs.trash.interval</name>
<value>60</value>
</property>
当删除文件后,文件会自动移动到回收站,存放指定时间后彻底删除。如果想恢复文件,可以使用mv
命令移动到指定位置。回收站默认路径/user/{系统当前用户}/.Trash
7. getmerge 可以将hadoop上的一个文件夹下所有文件合并下载
hadoop fs -getmerge /input ~/score #/input Hadoop上的文件夹 ~/socre 本地文件
8.ls查看所有文件hadoop fs -ls /
9.lsr 递归查看文件夹下文件,ls -r 简写。hadoop fs -lsr /
10.mkdir创建目录
11.mv移动文件
12.rm删除文件
13.tail将文件末尾1KB输出到标准输出,支持-f
选项