HDFS概念

最新推荐文章于 2024-06-29 17:09:02 发布

慢熟的孩子

最新推荐文章于 2024-06-29 17:09:02 发布

阅读量578

点赞数 1

分类专栏：大数据文章标签： HDFS

本文链接：https://blog.csdn.net/qq_45400755/article/details/97886715

版权

大数据专栏收录该内容

56 篇文章 0 订阅

订阅专栏

HDFS前提：
在这里插入图片描述
硬件错误
每个机器只存储文件的部分数据,blocksize=128M,存放在不同的机器上的，由于容错，HDFS默认采用3个副本机制。

Streaming data流式数据访问
HDFS更多的考虑的是批处理，高的吞吐量，而不是数据访问的延时性

Large Data sets大数据集
HDFS能提供一个整体高的数据传输

移动计算比移动数据更划算

HDFS架构

在这里插入图片描述
HDFS has a master/slave architecture. An HDFS cluster consists of a single NameNode, a master server that manages the file system namespace and regulates access to files by clients. In addition, there are a number of DataNodes, usually one per node in the cluster, which manage storage attached to the nodes that they run on. HDFS exposes a file system namespace and allows user data to be stored in files. Internally, a file is split into one or more blocks and these blocks are stored in a set of DataNodes. The NameNode executes file system namespace operations like opening, closing, and renaming files and directories. It also determines the mapping of blocks to DataNodes. The DataNodes are responsible for serving read and write requests from the file system’s clients. The DataNodes also perform block creation, deletion, and replication upon instruction from the NameNode.

1)namenode(master) and datanodes(slave)
2)master/slave的架构
3)NN:the file system namespace
/home/hadoop/software/app
通过客户端来传输数据
namenode：管理文件系统，为客户端提供数据访问
4）DN：负责数据存储
5）HDFS允许数据被拆分成多个块存储
6）一个文件会被拆分成多个块
7）多个块会被分在多个datanodes上
8）CRUD
9）决定文件的map，多个block存放在哪个节点上（namenode会管理）

a.txt
block1:128M ,192.168.1.1
block2：22M,192.168.1.2
get a.txt
对用户透明化
HDFS是java语言构造，所以系统需要java环境
10）通常情况下：一个node部署一个组件

namenode（文件系统）
HDFS supports a traditional hierarchical file organization. A user or an application can create directories and store files inside these directories. The file system namespace hierarchy is similar to most other existing file systems; one can create and remove files, move a file from one directory to another, or rename a file. HDFS supports user quotas and access permissions. HDFS does not support hard links or soft links. However, the HDFS architecture does not preclude implementing these features.

The NameNode maintains the file system namespace. Any change to the file system namespace or its properties is recorded by the NameNode. An application can specify the number of replicas of a file that should be maintained by HDFS. The number of copies of a file is called the replication factor of that file. This information is stored by the NameNode.
文件命名修改、移动文件、副本系数都被namenode记录下来的

HDFS副本机制

HDFS is designed to reliably store very large files across machines in a large cluster. It stores each file as a sequence of blocks. The blocks of a file are replicated for fault tolerance. The block size and replication factor are configurable per file.

All blocks in a file except the last block are the same size, while users can start a new block without filling out the last block to the configured block size after the support for variable length block was added to append and hsync.

An application can specify the number of replicas of a file. The replication factor can be specified at file creation time and can be changed later. Files in HDFS are write-once (except for appends and truncates) and have strictly one writer at any time.

The NameNode makes all decisions regarding replication of blocks. It periodically receives a Heartbeat and a Blockreport from each of the DataNodes in the cluster. Receipt of a Heartbeat implies that the DataNode is functioning properly. A Blockreport contains a list of all blocks on a DataNode.
在这里插入图片描述

慢熟的孩子

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
HDFS概念

HDFS前提：硬件错误每个机器只存储文件的部分数据,blocksize=128M,存放在不同的机器上的，由于容错，HDFS默认采用3个副本机制。Streaming data流式数据访问HDFS更多的考虑的是批处理，高的吞吐量，而不是数据访问的延时性Large Data sets大数据集HDFS能提供一个整体高的数据传输移动计算比移动数据更划算HDFS架构HDFS has a ...
复制链接

扫一扫

专栏目录