HDFS入门介绍

最新推荐文章于 2023-10-03 16:17:27 发布

大黑哞

最新推荐文章于 2023-10-03 16:17:27 发布

阅读量343

点赞数 1

分类专栏：大数据

大数据专栏收录该内容

10 篇文章 0 订阅

订阅专栏

HDFS入门介绍

HDFS 介绍

HDFS 是 Hadoop Distribute File System 的简称，意为：Hadoop 分布式文件系统。是 Hadoop 核心组件之一，作为最底层的分布式存储服务而存在。
分布式文件系统解决的问题就是大数据存储。它们是横跨在多台计算机上的存储系统。分布式文件系统在大数据时代有着广泛的应用前景，它们为存储和处理超大规模数据提供所需的扩展能力。

HDFS的特性

首先，它是一个文件系统，用于存储文件，通过统一的命名空间目录树来定位文件；
其次，它是分布式的，由很多服务器联合起来实现其功能，集群中的服务器有各自的角色。

1、 master/slave 架构
HDFS 采用 master/slave 架构。一般一个 HDFS 集群是有一个 Namenode 和一定数目的Datanode 组成。Namenode 是 HDFS 集群主节点，Datanode 是 HDFS 集群从节点，两种角色各司其职，共同协调完成分布式的文件存储服务。
2、分块存储
HDFS 中的文件在物理上是分块存储（block）的，块的大小可以通过配置参数来规定，默认大小在 hadoop2.x 版本中是 128M。
3、名字空间（NameSpace）
HDFS 支持传统的层次型文件组织结构。用户或者应用程序可以创建目录，然后将文件保存在这些目录里。文件系统名字空间的层次结构和大多数现有的文件系统类似：用户可以创建、删除、移动或重命名文件。
Namenode 负责维护文件系统的名字空间，任何对文件系统名字空间或属性的修改都将被Namenode 记录下来。
HDFS 会给客户端提供一个统一的抽象目录树，客户端通过路径来访问文件，形如：hdfs://namenode:port/dir-a/dir-b/dir-c/file.data。
4、 Namenode 元数据管理
我们把目录结构及文件分块位置信息叫做元数据。Namenode 负责维护整个hdfs文件系统的目录树结构，以及每一个文件所对应的 block 块信息（block 的id，及所在的datanode 服务器）。
5、Datanode 数据存储
文件的各个 block 的具体存储管理由 datanode 节点承担。每一个 block 都可以在多个datanode 上。Datanode 需要定时向 Namenode 汇报自己持有的 block信息。存储多个副本（副本数量也可以通过参数设置 dfs.replication，默认是 3）。
6、副本机制
为了容错，文件的所有 block 都会有副本。每个文件的 block 大小和副本系数都是可配置的。应用程序可以指定某个文件的副本数目。副本系数可以在文件创建的时候指定，也可以在之后改变。
7、一次写入，多次读出
HDFS 是设计成适应一次写入，多次读出的场景，且不支持文件的修改。
正因为如此，HDFS 适合用来做大数据分析的底层存储服务，并不适合用来做.网盘等应用，因为，修改不方便，延迟大，网络开销大，成本太高。

hdfs的命令行使用

ls
Usage: hdfs dfs -ls [-R]
Options:
• The -R option will return stat recursively through the directory structure.
For a file returns stat on the file with the following format:
permissions number_of_replicas userid groupid filesize modification_date modification_time filename
For a directory it returns list of its direct children as in Unix. A directory is listed as:
permissions userid groupid modification_date modification_time dirname
Example:
• hdfs dfs -ls /user/hadoop/file1
Exit Code:
Returns 0 on success and -1 on error.
lsr
Usage: hdfs dfs -lsr
Recursive version of ls.
Note: This command is deprecated. Instead use hdfs dfs -ls -R
mkdir
Usage: hdfs dfs -mkdir [-p]
Takes path uri’s as argument and creates directories.
Options:
• The -p option behavior is much like Unix mkdir -p, creating parent directories along the path.
Example:
• hdfs dfs -mkdir /user/hadoop/dir1 /user/hadoop/dir2
• hdfs dfs -mkdir hdfs://nn1.example.com/user/hadoop/dir hdfs://nn2.example.com/user/hadoop/dir
Exit Code:
Returns 0 on success and -1 on error.
moveFromLocal
Usage: hdfs dfs -moveFromLocal
Similar to put command, except that the source localsrc is deleted after it’s copied.
moveToLocal
Usage: hdfs dfs -moveToLocal [-crc]
Displays a “Not implemented yet” message.
mv
Usage: hdfs dfs -mv URI [URI …]
Moves files from source to destination. This command allows multiple sources as well in which case the destination needs to be a directory. Moving files across file systems is not permitted.
Example:
• hdfs dfs -mv /user/hadoop/file1 /user/hadoop/file2
• hdfs dfs -mv hdfs://nn.example.com/file1 hdfs://nn.example.com/file2 hdfs://nn.example.com/file3 hdfs://nn.example.com/dir1
Exit Code:
Returns 0 on success and -1 on error.
put
Usage: hdfs dfs -put …
Copy single src, or multiple srcs from local file system to the destination file system. Also reads input from stdin and writes to destination file system.
• hdfs dfs -put localfile /user/hadoop/hadoopfile
• hdfs dfs -put localfile1 localfile2 /user/hadoop/hadoopdir
• hdfs dfs -put localfile hdfs://nn.example.com/hadoop/hadoopfile
• hdfs dfs -put - hdfs://nn.example.com/hadoop/hadoopfile Reads the input from stdin.
Exit Code:
Returns 0 on success and -1 on error.
appendToFile
Usage: hdfs dfs -appendToFile …
追加一个或者多个文件到hdfs指定文件中.也可以从命令行读取输入.
• hdfs dfs -appendToFile localfile /user/hadoop/hadoopfile
• hdfs dfs -appendToFile localfile1 localfile2 /user/hadoop/hadoopfile
• hdfs dfs -appendToFile localfile hdfs://nn.example.com/hadoop/hadoopfile
• hdfs dfs -appendToFile - hdfs://nn.example.com/hadoop/hadoopfile Reads the input from stdin.
Exit Code:
Returns 0 on success and 1 on error.
cat
Usage: hdfs dfs -cat URI [URI …]
查看内容.
Example:
• hdfs dfs -cat hdfs://nn1.example.com/file1 hdfs://nn2.example.com/file2
• hdfs dfs -cat file:///file3 /user/hadoop/file4
Exit Code:
Returns 0 on success and -1 on error.

cp
Usage: hdfs dfs -cp [-f] [-p | -p[topax]] URI [URI …]
复制文件(夹)，可以覆盖，可以保留原有权限信息
Options:
• The -f option will overwrite the destination if it already exists.
• The -p option will preserve file attributes [topx] (timestamps, ownership, permission, ACL, XAttr). If -p is specified with no arg, then preserves timestamps, ownership, permission. If -pa is specified, then preserves permission also because ACL is a super-set of permission. Determination of whether raw namespace extended attributes are preserved is independent of the -p flag.
Example:
• hdfs dfs -cp /user/hadoop/file1 /user/hadoop/file2
• hdfs dfs -cp /user/hadoop/file1 /user/hadoop/file2 /user/hadoop/dir
Exit Code:
Returns 0 on success and -1 on error.
rm
Usage: hdfs dfs -rm [-f] [-r|-R] [-skipTrash] URI [URI …]
Delete files specified as args.
Options:
• The -f option will not display a diagnostic message or modify the exit status to reflect an error if the file does not exist.
• The -R option deletes the directory and any content under it recursively.
• The -r option is equivalent to -R.
• The -skipTrash option will bypass trash, if enabled, and delete the specified file(s) immediately. This can be useful when it is necessary to delete files from an over-quota directory.
Example:
• hdfs dfs -rm hdfs://nn.example.com/file /user/hadoop/emptydir
Exit Code:
Returns 0 on success and -1 on error.
rmr
Usage: hdfs dfs -rmr [-skipTrash] URI [URI …]
Recursive version of delete.
Note: This command is deprecated. Instead use hdfs dfs -rm -r

chmod
Usage: hdfs dfs -chmod [-R] <MODE[,MODE]… | OCTALMODE> URI [URI …]
修改权限.
Options
• The -R option will make the change recursively through the directory structure.
chown
Usage: hdfs dfs -chown [-R] [OWNER][:[GROUP]] URI [URI ]
修改所有者.
Options
• The -R option will make the change recursively through the directory structure.

expunge
Usage: hdfs dfs -expunge
清空回收站.

hdfs的高级使用命令

1、HDFS文件限额配置
hdfs文件的限额配置允许我们以文件大小或者文件个数来限制我们在某个目录下上传的文件数量或者文件内容总量，以便达到我们类似百度网盘网盘等限制每个用户允许上传的最大的文件的量
1.1、数量限额

hdfs dfs -mkdir -p /user/root/lisi     #创建hdfs文件夹

hdfs dfsadmin -setQuota 2 lisi      # 给该文件夹下面设置最多上传两个文件，上传文件，发现只能上传一个文件

`hdfs dfsadmin -clrQuota /user/root/lisi`    # 清除文件数量限制

1.2、空间大小限额

hdfs dfsadmin -setSpaceQuota 4k /user/root/lisi   # 限制空间大小4KB
hdfs dfs -put  /export/softwares/zookeeper-3.4.5-cdh5.14.0.tar.gz /user/root/lisi    #上传超过4Kb的文件大小上去提示文件超过限额
hdfs dfsadmin -clrSpaceQuota /user/root/lisi   #清除空间限额
hdfs dfs -put  /export/softwares/zookeeper-3.4.5-cdh5.14.0.tar.gz /user/root/lisi /user/root/lisi   #重新上传成功

1.3、查看hdfs文件限额数量

hdfs dfs -count -q -h /user/root/lisi

2、hdfs的安全模式

安全模式是HDFS所处的一种特殊状态，在这种状态下，文件系统只接受读数据请求，而不接受删除、修改等变更请求。在NameNode主节点启动时，HDFS首先进入安全模式，DataNode在启动的时候会向namenode汇报可用的block等状态，当整个系统达到安全标准时，HDFS自动离开安全模式。如果HDFS出于安全模式下，则文件block不能进行任何的副本复制操作，因此达到最小的副本数量要求是基于datanode启动时的状态来判定的，启动时不会再做任何复制（从而达到最小副本数量要求），hdfs集群刚启动的时候，默认30S钟的时间是出于安全期的，只有过了30S之后，集群脱离了安全期，然后才可以对集群进行操作

hdfs  dfsadmin  -safemode

在这里插入图片描述

大黑哞

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
HDFS入门介绍

HDFS入门介绍HDFS 介绍HDFS 是 Hadoop Distribute File System 的简称，意为：Hadoop 分布式文件系统。是 Hadoop 核心组件之一，作为最底层的分布式存储服务而存在。分布式文件系统解决的问题就是大数据存储。它们是横跨在多台计算机上的存储系统。分布式文件系统在大数据时代有着广泛的应用前景，它们为存储和处理超大规模数据提供所需的扩展能力。HDFS...
复制链接

扫一扫