HDFS

- suitable

  •  very large size, terabyte, petabyte
  •  write once and read many times
  •  handle node failure without noticeable interruption

- not suitable for some applications with,

  •  low-latency data access, HBase
  •  lots of small files
  •  multiple writers and arbitrary modification. only append to the end of file


- Block

replication on block

- namenode and datanode

  • namenode - filesystem namespace image and edit logs which are written to multiple file systems or NFS. they are critical and cannot be lost
  • namenode - block pool having blocks reported from datanodes when system startup
  • secondary namenode - merge edit logs to the namespace image. when primary namenode is down, copy namespace image and edit logs from NFS and make it as primary namenode. Then,
  1. load the image to memory
  2. replay edit logs
  3. receive enough block reports from datanodes to leave safe mode
  • block can be cached in datanode's memory on per-file basis

- HDFS Federation

  • map file namespaces to different namenodes, like /usr and /share
  • block pools in namenodes are not partitioned. they get block reports from same datanodes if they register with the namenodes.
  • client uses mount table to map path to namenodes

- HA (High Availability)

  • primary namenode and standby namenode share the storage for the image and edit logs
  • datanodes send block reports to the both namenodes
  • standby namenode does the merge of edit logs
  • zookeeper to select namenode. failover and fencing

- Java API

  • FileSystem, Path, FSDataInputStream, FSDataOutputStream, FileStatus
  • fis = fs.open, fos = fs.create, fos = fs.append

- Anatomy of file read


  • read block by block.
  • close connection to datanode when its block reading is done
  • network topology need to be set for hadoop, same node -> same rack -> same data center -> different data centers

- anatomy of file write


  • arrange blocks (by DataStreamer), the first replica is in local, the second is in off-rack, the third is in different node in the same rack as the second.
  • write completes on minimum replica requirement, usually 1. data queue and ack queue.

- coherency

  • use hflush or hsync to make content visible to reader. close calls hflush
- parallel copy, distcp. multiple mapper task to make cluster balanced. one mapper task for each file
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值