hdfs data flow-part writing

The client creates the file by calling create() on DistributedFileSystem (step 1 in
Figure 3-3). DistributedFileSystem makes an RPC call to the namenode to create a new
file in the filesystem’s namespace, with no blocks associated with it (step 2). The name-
node performs various checks to make sure the file doesn’t already exist, and that the
client has the right permissions to create the file. If these checks pass, the namenode
makes a record of the new file; otherwise, file creation fails and the client is thrown an
IOException. The DistributedFileSystem returns a FSDataOutputStream for the client to
start writing data to. Just as in the read case, FSDataOutputStream wraps a DFSOutput
Stream, which handles communication with the datanodes and namenode.


As the client writes data (step 3), DFSOutputStream splits it into packets, which it writes
to an internal queue, called the data queue. The data queue is consumed by the Data
Streamer, whose responsibility it is to ask the namenode to allocate new blocks by
picking a list of suitable datanodes to store the replicas. The list of datanodes forms a
pipeline—we’ll assume the replication level is 3, so there are three nodes in the pipeline.
The DataStreamer streams the packets to the first datanode in the pipeline, which stores
the packet and forwards it to the second datanode in the pipeline. Similarly, the second
datanode stores the packet and forwards it to the third (and last) datanode in the pipe-
line (step 4).

 

DFSOutputStream also maintains an internal queue of packets that are waiting to be
acknowledged by datanodes, called the ack queue. A packet is removed from the ack
queue only when it has been acknowledged by all the datanodes in the pipeline (step 5).


If a datanode fails while data is being written to it, then the following actions are taken,
which are transparent to the client writing the data. First the pipeline is closed, and any
packets in the ack queue are added to the front of the data queue so that datanodes
that are downstream from the failed node will not miss any packets. The current block
on the good datanodes is given a new identity, which is communicated to the name-
node, so that the partial block on the failed datanode will be deleted if the failed data-
node recovers later on. The failed datanode is removed from the pipeline and the
remainder of the block’s data is written to the two good datanodes in the pipeline. The
namenode notices that the block is under-replicated, and it arranges for a further replica
to be created on another node. Subsequent blocks are then treated as normal.

 

It’s possible, but unlikely, that multiple datanodes fail while a block is being written.
As long as dfs.replication.min replicas (default one) are written the write will succeed,
and the block will be asynchronously replicated across the cluster until its target rep-
lication factor is reached (dfs.replication, which defaults to three).

 

 

When the client has finished writing data it calls close() on the stream (step 6). This
action flushes all the remaining packets to the datanode pipeline and waits for ac-
knowledgments before contacting the namenode to signal that the file is complete (step
7). The namenode already knows which blocks the file is made up of (via Data
Streamer asking for block allocations), so it only has to wait for blocks to be minimally
replicated before returning successfully.

 

 

 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
资源包主要包含以下内容: ASP项目源码:每个资源包中都包含完整的ASP项目源码,这些源码采用了经典的ASP技术开发,结构清晰、注释详细,帮助用户轻松理解整个项目的逻辑和实现方式。通过这些源码,用户可以学习到ASP的基本语法、服务器端脚本编写方法、数据库操作、用户权限管理等关键技术。 数据库设计文件:为了方便用户更好地理解系统的后台逻辑,每个项目中都附带了完整的数据库设计文件。这些文件通常包括数据库结构图、数据表设计文档,以及示例数据SQL脚本。用户可以通过这些文件快速搭建项目所需的数据库环境,并了解各个数据表之间的关系和作用。 详细的开发文档:每个资源包都附有详细的开发文档,文档内容包括项目背景介绍、功能模块说明、系统流程图、用户界面设计以及关键代码解析等。这些文档为用户提供了深入的学习材料,使得即便是从零开始的开发者也能逐步掌握项目开发的全过程。 项目演示与使用指南:为帮助用户更好地理解和使用这些ASP项目,每个资源包中都包含项目的演示文件和使用指南。演示文件通常以视频或图文形式展示项目的主要功能和操作流程,使用指南则详细说明了如何配置开发环境、部署项目以及常见问题的解决方法。 毕业设计参考:对于正在准备毕业设计的学生来说,这些资源包是绝佳的参考材料。每个项目不仅功能完善、结构清晰,还符合常见的毕业设计要求和标准。通过这些项目,学生可以学习到如何从零开始构建一个完整的Web系统,并积累丰富的项目经验。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值