一篇文章了解HDFS名称空间NameSpace

最新推荐文章于 2023-05-14 08:30:11 发布

尹忠政

最新推荐文章于 2023-05-14 08:30:11 发布

阅读量6.7k

点赞数

分类专栏： hadoop 文章标签： 1024程序员节 hadoop hdfs 大数据 big data

本文链接：https://blog.csdn.net/qq_22271479/article/details/120935411

版权

hadoop 专栏收录该内容

13 篇文章 1 订阅

订阅专栏

一篇文章了解HDFS名称空间NameSpace

NameNode

掌握整个HDFS的文件树，文件和目录
管理DataNode（心跳机制）
上/下线
副本迁移
数据平衡
可能集群内数据分布不均匀
客户端的读写请求

小问题如何判断是根目录呢？
path的长度为0就是根目录

NameSpace

文件结构

在这里插入图片描述

文件系统元数据（第一行）

imgVersion

当前fsiamge文件的版本号

当前命名空间的ID，在NameNode的生命周期内保持不变，
DataNode注册时，返回该ID作为其registrationID，
每次和NameNode通信时都要检查，不认识的namespaceID拒绝连接

numFiles

文件系统中的文件数

genStamp

生成该fsimage文件的时间戳

目录元数据

path

replicas

mtime

修改时间

atime

访问时间

blocksiz

nsQuota

dsQuota

username

group

用户所属的组名

perm

即permission，访问权限

文件元数据(包含目录元数据)

blockid

文件的文件块id

numBytes

该文件块的bytes数，即文件块的大小

genStamp

文件块的时间戳

FSImage

保存了最新的元数据检查点

Edits

保存了在最新检查点后最新的命名空间的变化记录

BlocksMap

首先FsImage解决的是文件命名空间问题，例如目录、文件路径、blockid等信息，但是没有这些block并没有与DataNode进行关联起来，那么就需要一个block和DataNode的映射关系

数据结构

private final int capacity
是需要在初始化entries数组的时候给初始值
private volatile GSet<Block, BlockInfo> blocks
LightWeightGSet
- protected LightWeightGSet.LinkedElement[] entries;

注:BlockInfo继承了block

原理

LightWeightGSet

构造方法

    public LightWeightGSet(int recommended_length) {
     int actual = actualArrayLength(recommended_length);
     if (LOG.isDebugEnabled()) {
         LOG.debug("recommended=" + recommended_length + ", actual=" + actual);
     }

     this.entries = new LightWeightGSet.LinkedElement[actual];
     this.hash_mask = this.entries.length - 1;
  }

put方法

    public E put(E element) {
        if (element == null) {
    throw new NullPointerException("Null element is not supported.");
        } else {
    LightWeightGSet.LinkedElement e = null;

            try {
    e = (LightWeightGSet.LinkedElement)element;
            } catch (ClassCastException var5) {
   throw new HadoopIllegalArgumentException("!(element instanceof LinkedElement), element.getClass()=" + element.getClass());
            }
  //获取下标
  int index = this.getIndex(element);
  //判断是否存在，再这里逻辑，存在就将该节点移除
  E existing = this.remove(index, element);
  ++this.modification;
  ++this.size;
  //将新节点插入到链表的头
  e.setNext(this.entries[index]);
  //更新数组
  this.entries[index] = e;
  return existing;
 }

    }

getIndex方法

      protected int getIndex(K key) {
    //Block的hashcode 
    //(int)(blockId^(blockId>>>32));
     return key.hashCode() & this.hash_mask;

    }

  * 主要目的是降低hash冲突，减少链表的长度，提高检索速度

get方法

   //先根据Block获得index，然后从数组中拿到链表，然后for循环遍历链表拿到最终的key
   public E get(K key) {
      if (key == null) {
         throw new NullPointerException("key == null");
      } else {
      int index = this.getIndex(key);
   
      for(LightWeightGSet.LinkedElement e = this.entries[index]; e != null; e = e.getNext()) {
         if (e.equals(key)) {
           return this.convert(e);
         }
      }
       return null;
     }
       }

BlockInfo

private Object[] triplets;

存储了block具体的到DataNode的映射关系，还包含了pre和nextBlock，为后续需Pipeline读取做准备

容量
- 副本数
triplets[3*i] i表示副本index

DatanodeDescriptor,Datanode的描述信息，ip，id等
triplets[3*i+1]
previous BlockInfo，上一个block，因为一个文件会被切分城多个快儿，通过namenode中的path->blockid,找到第一个triplets
，然后文件的后续快都可以自动完成，实现pipeline
triplets[3*i+2]
next BlockInfo 下一个Blockinfo
构造方法

  public BlockInfo(Block blk, int replication) {
     super(blk);
     this.triplets = new Object[3*replication];
     this.bc = null;
  }

总结

一方面NameNode掌握了名称空间，也就是文件和目录的列表，这个每个块都有对应的blockid，客户端根据path请求文件，那么NameNode找到对用blockId，去BlocksMap中去查找，时机存储的DataNode，并通过pipeline读取数据

尹忠政

关注

0
点赞
踩
7

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

一篇文章了解HDFS名称空间NameSpace

一篇文章了解HDFS名称空间NameSpace

文章目录

NameNode

NameSpace

文件结构

文件系统元数据（第一行）

imgVersion

numFiles

genStamp

目录元数据

path

replicas

mtime

atime

blocksiz

nsQuota

dsQuota

username

group

perm

文件元数据(包含目录元数据)

blockid

numBytes

genStamp

FSImage

Edits

BlocksMap

数据结构

原理

LightWeightGSet

构造方法

put方法

getIndex方法

get方法

BlockInfo

private Object[] triplets;

总结

一方面NameNode掌握了名称空间，也就是文件和目录的列表，这个每个块都有对应的blockid，客户端根据path请求文件，那么NameNode找到对用blockId，去BlocksMap中去查找，时机存储的DataNode，并通过pipeline读取数据