名字节点简介
名字节点在Hadoop分布式文件系统中只有一个实例,却是最复杂的一个实例。名字节点维护着HDFS文件系统中两个最重要的关系:
1、HDFS文件系统的文件目录树,以及文件的数据块索引,即每个文件对应的数据块列表。
2、数据块和数据节点的对应关系,即某一个数据块保存在哪些数据及节点的信息。
其中,HDFS的目录树、元信息和数据块索引等信息会持久到磁盘上,保存在命名空间镜像和编辑日志中。数据块和数据节点的对应关系则在名字节点启动后,由数据节点上报,动态建立。在上述关系的基础山个,名字节点管理数据节点,接收数据节点的注册、心跳、数据块提交等信息上报,发送数据块复制、删除、恢复等名字节点指令;同时,名字节点还为客户端对文件系统目录树的操作和对文件数据读写、对HDFS系统进行管理提供支持。
后面的讨论中,会把与目录树相关,即HDFS目录树、文件/目录元信息和数据块索引间的关系称为名字节点第一关系,数据块和数据节点的对应关系称为名字节点第二关系。
文件系统的目录树
1)、从i-node到INode
在Linux文件系统中的i-node,即索引节点。索引节点保存Linux文件的一些元信息,如文件类型与权限、所有者标识和以字节为单位的文件长度,在i-node的后半部分,则存放着数据块索引,也就是文件或目录数据的位置。
根据存放数据的大小,数据块索引可分为i-node内索引、一次间接块、二次间接块等。对于文件,数据块索引指向的就是文件数据分块存储的位置。对于文件,数据块索引指向的就是文件数据分块存储的位置,通过这些索引信息,就可以读取或者写入文件数据;对于目录而言,目录项被保存在为目录分配的块中。
上述i-node实现,充分考虑了需要将i-node保存在块设备上这个因素,所以引入了很多定长的记录和结构。
Linux的i-node是如此的有影响,以至于在名字节点中,对文件和目录的抽象使用INode对类进行命名。名字节点中,名字节点带INode的类包括INode、INodeDirectory、INodeFile和INodeFileUnderConstrucation等。它们的关系如下:
INode是一个抽象类,它是INodeDirectory和INodeFile的父类,INodeDirectory自然代表了HDFS中的目录,而INodeFile则抽象了HDFS中的文件。INodeDirectory有一个子类INodeDirectoryWithQuota,如类的名字所示,就是带配额的目录。INodeFile的子类INodeFileUnderConstrruction,是这个继承树上的一个"异类",它代表了一个为写而打开的文件。
1、INode
INode作为这棵继承树的根,保存文件和目录共有的属性,包括文件/目录名name、父目录parent、最后修改时间modificationTime、最后访问时间accessTime和同时保存访问权限、文件主标识符、文件所在用户组标识符的permission。和Linux的i-node比,HDFS的INode不需要支持硬连接,也不需要支持索引节点最后修改时间等功能,所有它保存的属性比i-node少。
INode的属性中需要注意的是permission,其类型是长整型,即它有64字节,保存文件/目录的三个属性。如果将这三个属性保存在一个变量中呢?INode将permission的64字节分为三段,分别用于保存访问权限、文件主标识符和用户组标识符,并巧妙地利用了Java的枚举,建立长整型上的分段操作,实现了上述三个文件属性的操作。代码如下:
private static enum PermissionStatusFormat {
MODE(0, 16),
GROUP(MODE.OFFSET + MODE.LENGTH, 25),
USER(GROUP.OFFSET + GROUP.LENGTH, 23);
final int OFFSET;
final int LENGTH; //bit length
final long MASK;
PermissionStatusFormat(int offset, int length) {
OFFSET = offset;
LENGTH = length;
MASK = ((-1L) >>> (64 - LENGTH)) << OFFSET;
}
long retrieve(long record) {
return (record & MASK) >>> OFFSET;
}
long combine(long bits, long record) {
return (record & ~MASK) | (bits << OFFSET);
}
}
枚举PermissionStatusFormat有三个值,MODE、GROUP、USER,分别用于处理访问权限、用户组标识符和文件主标识符。上述三个枚举值创建时,都会调用PermissionStatusFormat的构造函数,构造函数需要链各个参数,分别是枚举值对应的属性在长整形permission中的偏移量和长度。
INode.getUserName()用于获得该INodede文件主,由于文件所有者标识符保存在permission的41~63位,使用枚举USER.retrieve(),它会把成员变量permission和掩码USER.MASK进行与运算,然后
右移,获得标识符的值;然后再保存标识符和用户名对应关系的SerialNumberManager实例中进行查找,以得到字符串形式的用户名。
INode.setUser()用于设置节点的用户名,它的实现也利用了PermissionStatusFormat,使用USER.combine()设置文件所有者标识符对应位。
HDFS中,用户名和用户标识的影射、用户组名和用户组标识符的影射保存在SerialNumberManager对象中。通过SerialNumberManaer,名字节点不必在INode对象中保存字符串形式的用户名和用户组名,节省了对象对内存的占用。
INode中的方法大多比较简单,提供了对成员变量的访问/设置能力。这些方法中,isRoot()值得一提,该方法用于判断当前节点是否是目录树的根,目录树的根是HDFS忠最重要的一个目录,所有目录都有跟目录衍生。如果INode的成员属性name长度为0,我们约定这是HDFS的根节点,INode.isRoot()返回真,否则,该节点便不是HDFS的根。
2、INodeDirectory和INodeDirectoryWithQuota
Linux是使用C语言开发的,缺少继承机制,所有,在i-node的第一个字段i_mode中,保存了文件类型和和访问权限字段,后续处理逻辑必须根据i-node的文件类型,使用不同的处理逻辑。但在HDFS的实现中,利用了Java的继承机制,目录和文件分别作为INode的子类,并以多态实现了文件和目录的特有操作。
INodeDirectory抽象了HDFS中的目录,目录是文件系统中的重要概念,“目录”是一个虚拟容器,里面保存一组文件和其他一些目录。除了根目录,HDFS中的文件/目录都属于某一个“目录容器”,类型为INodeDirectory的INode的成员变量parent,保存着文件/目录的父目录。
目录作为容器,在INodeDirectory实现中的体现是成员变量children,它是一个保存INode的列表。INodeDirectory中的大部分方法都是在操作这个列表,如创建子目录项、查询或遍历子目录项、替换子目录项等,它们的实现都比较简单。如下所示:
/**
* Directory INode class.
*/
class INodeDirectory extends INode {
......
INode removeChild(INode node) {
assert children != null;
int low = Collections.binarySearch(children, node.name);
if (low >= 0) {
return children.remove(low);
} else {
return null;
}
}
/** Replace a child that has the same name as newChild by newChild.
*
* @param newChild Child node to be added
*/
void replaceChild(INode newChild) {
if ( children == null ) {
throw new IllegalArgumentException("The directory is empty");
}
int low = Collections.binarySearch(children, newChild.name);
if (low>=0) { // an old child exists so replace by the newChild
children.set(low, newChild);
} else {
throw new IllegalArgumentException("No child exists to be replaced");
}
}
INode getChild(String name) {
return getChildINode(DFSUtil.string2Bytes(name));
}
......
}
private INode getChildINode(byte[] name) {
if (children == null) {
return null;
}
int low = Collections.binarySearch(children, name);
if (low >= 0) {
return children.get(low);
}
return null;
}
/**
*/
private INode getNode(byte[][] components) {
INode[] inode = new INode[1];
getExistingPathINodes(components, inode);
return inode[0];
}
如果删除的是文件,怎么才能删除文件拥有的数据块?collectSubtreeBlocksAndClear()方法是INode的抽象方法,它会返回INode所在的子目录树中所有文件拥有的数据块。在调用节点删除操作前,名字节点的处理逻辑会使用这个方法,收集目录树拥有的所有数据块。当成员变量children为空,即目录是空目录的情况下,方法直接返回;否则,循环调用目录所管理的目录项的同名方法,收集文件或者子目录中文件的数据块。代码如下:
int collectSubtreeBlocksAndClear(List<Block> v) {
int total = 1;
if (children == null) {
return total;
}
for (INode child : children) {
total += child.collectSubtreeBlocksAndClear(v);
}
parent = null;
children = null;
return total;
}
INodeDirectory有一个子类INodeDirectoryWithQuota,用于实现HDFS的配额机制,HDFS允许管理员为每个目录设置配额。配额有两种:
一、节点配额:用于限制目录下的名字数量,如果创建文件或目录时超过了该配额,操作失败。这个配额用于控制用于对于名字节点资源的占用,保存在成员变量nsQuota中。
二、空间配额:限制存储在目录树下的所有文件的总规模,空间配额保证用户不会过多咱用数据节点的资源,该配额由dsQuota变量保存。
HDFS的dfsadmin工具提供了修改目录配额的命令,该命令会修改INodeDirectoryWithQuota对象相应的成员变量。方法INodeDirectoryWithQuota.verifyQuota()则用于检测对目录树的更新是否满足设置的配额,如果不满足,,则该方法或抛出异常。
3、INodeFile和INodeFileUnderConstrucation
名字节点中个,文件由INodeFile抽象,它也是INode的子类。
INodeFile包含了两个文件特有的属性:header和blocks。变量header使用了和INode.permission一样的方法,在一个长整形变量里保存了文件的副本系数和文件数据块的大小,它的高16字节存放着副本系数,低48位存放了数据块大小。数组block存放了数据块大小。数组blocks保存文件拥有的数据块,数组元素类型是BlockInfo。
INodeFileUnderConstrucation是INodeFile的子类。它是指处于构建状态的文件索引节点,当客户端为写或者追加数据打开HDFS文件时,该文件就处于构建状态,在HDFS的目录树中,相应的节点就是一个INodeFileUnderConstruction对象。
在HDFS中,往文件里添加数据是一件比较复杂的事情,这一点不但体现在低7章对写数据流式接口的讨论中,也体现在INodeFileUnderConstruction等名字节点相关的实现中。代码如下:
class INodeFileUnderConstruction extends INodeFile {
String clientName; // lease holder
private final String clientMachine;
private final DatanodeDescriptor clientNode; // if client is a cluster node too.
private int primaryNodeIndex = -1; //the node working on lease recovery
private DatanodeDescriptor[] targets = null; //locations for last block
private long lastRecoveryTime = 0;
......
}
INodeFileUnderConstuction相关:
clientName:发起文件写的客户端名称,这个属性也用于租约管理中,在HDFS中租约是名字节点维护的,给予客户端在一定期限内可以进行文件写操作的权限的合同。
clientMachine:客户端所在的主机。
clientNode:如果客户端运行在集群内的某一个数据节点上时,对应的数据节点信息,DatanodeDescriptor是名字节点内部使用的,用于保存数据节点信息的类,它继承自DatanodeInfo。
targets:最后一个数据块的数据流管道成员,即当前参与到写数据的数据节点的列表。
primaryNodeIndex和lastRecoveryTime:都用于名字节点发起的数据块恢复,由名字节点发起的数据块恢复也叫租约恢复,这两个变量分别保存恢复时的主数据节点索引和恢复开始时间。
命名空间镜像和编辑日志
内存中的HDFS目录树及文件/目录元信息由INode及其子类保存,如果节点掉电或进程崩溃,数据将不再存在,因此,必须将上述信息保存在磁盘。命名空间镜像保存某一个时刻目录树的信息,它在HDFS中实现为类FSImage,是内存元数据和磁盘元数据间的桥梁。对内存目录树的修改,也必须同步到磁盘元数据上,但每次修改都将内存元数据导出到磁盘,显然是不现实的,为此,名字节点引入了镜像编辑日志,将改动保存在日志中,形成了命名空间镜像和编辑日志的备份机制,其中,命名空间镜像是某一个时间内存元数据的真实情况,而编辑日志则记录了该时刻以后所有元数据操作。
1、名字节点的磁盘目录文件结构
名字节点管理的目录可以分别为之保存命名空间镜像(配置项${dfs.name.dir}指定)、只保存编辑日志(配置项${dfs.name.edits.dir})和同时保存命名空间镜像和编辑日志三种情况。和数据节点类似的是,配置项都可以指定多个目录。在不指定配置项${dfs.name.edits.dir}的情况下,编辑日志也会存放在${dfs.name.dir}中。一般HDFS都不会特殊指定${dfs.name.edits.dir}。
${dfs.name.dir}目录下一般有三个目录和一个文件,其中文件是“in_use.lock”,它的功能和数据节点中的同名文件时一样的,保证名节点独占使用该目录。名字节点格式化后只会产生“current”和“image”目录,节点启动并运行一段时间后,才会创建“previous.checkpoint”目录。
名字节点把命名空间镜像和编辑日志保存在“current”目录下:
通常,这个目录有4个文件,它们分别是:
fsimage:元数据镜像文件
edits:日志文件,和元数据镜像文件一起,提供了一个完整的HDFS目录树及元信息。
fstime:保存了最后一次检查点的时间,检查点一般由第二名字节点产生,是一次fsimage和edits合并的结果。
VERSION:和数据节点类似,该文件保存了名字节点存储的一些属性。
${dfs.name.dir}/previous.checkpoint保存名字节点的上一次检查点,它的目录结构和“current”目录是一致的。${dfs.name.dir}/image是0.13及以前版本保存“fsimage”文件的地方,该目录下的“fsimage”文件和数据及节点的${dfs.data.dir}/storage文件有一样的作用,防止不兼容当前目录结构的名字节点的误启动。
2、FSImage和FSEditLog
数据节点中,对节点存储空间的管理,分别是由数据节点存储DataStorage和文件系统数据集FSDataset实现,DataStorage主要实现了存储空间的状态管理,而FSDataset则提供了数据节点逻辑所需的数据块存储服务。
和数据节点不同,名字节点的存储空间管理是由FSImage类和FSEditLog类一起完成。其中,命名空间镜像FSImage起主导作用,它管理这存储空间的生存期,同时,也负责命名空间镜像的保存和加载,还需要和第二名字节点合作,执行检查点过程。它与数据节点存储DataStorage类一样也继承自Storage。它们都利用了Storage提供的方法管理名字节点的目录结构。
编辑日志FSEditLog记录了对元数据的修改,和FSImage不同,编辑日志随着名字节点的运行不断产生,所有FSEditLog依赖输出流,使用输出流记录目录树的变化。与输出流对应的是输入流,用于读取持久化在磁盘上的编辑日志。FSEditLog为记录元数据变化提供了大量的log*()方法,如文件改名,就可以通过logRename()记录到编辑日志中。
3、命名空间镜像的保存
命名空间镜像存储了某一个时刻名字节点内存元数据的信息,包括前面分析过的INode信息,正在写入的文件信息(租约和INodeFileUnderConstruction对象)和其他一些状态信息。FSImage.saveFSImage()会将当前时刻的命名空间镜像,保存到参数newFile指定的文件中,代码如下:
void saveFSImage(File newFile) throws IOException {
FSNamesystem fsNamesys = FSNamesystem.getFSNamesystem();
FSDirectory fsDir = fsNamesys.dir;
long startTime = FSNamesystem.now();
//
// Write out data
//
DataOutputStream out = new DataOutputStream(
new BufferedOutputStream(
new FileOutputStream(newFile)));
try {
out.writeInt(FSConstants.LAYOUT_VERSION);
out.writeInt(namespaceID);
out.writeLong(fsDir.rootDir.numItemsInTree());
out.writeLong(fsNamesys.getGenerationStamp());
byte[] byteStore = new byte[4*FSConstants.MAX_PATH_LENGTH];
ByteBuffer strbuf = ByteBuffer.wrap(byteStore);
// save the root
saveINode2Image(strbuf, fsDir.rootDir, out);
// save the rest of the nodes
saveImage(strbuf, 0, fsDir.rootDir, out);
fsNamesys.saveFilesUnderConstruction(out);
fsNamesys.saveSecretManagerState(out);
strbuf = null;
} finally {
out.close();
}
LOG.info("Image file of size " + newFile.length() + " saved in "
+ (FSNamesystem.now() - startTime)/1000 + " seconds.");
}
成员函数savaFSImage()很简单,首先输出镜像文件的文件头,包括命名空间镜像格式的版本号、存储系统标识、目录树包含的节点数和当前已用的数据块版本号。
接下来,使用静态方法saveINode2Image()输出根节点,使用saveImage()保存目录树中的其他节点。根节点是名字节点管理的特殊INode,它的成员属性INode.name的长度为0,所以必须做特殊处理。
FSImag.saveImage()将目录树current下的所有INode信息(不包括current)输出到输出流out中。该方法的前两个参数携带了父节点的绝对路径,路径保存在parentPrefix中,长度为prefixLength。由于HDFS文件和目录的绝对路径不能超过8000字节,所有缓冲区parentPrefix的大小为31.25KB。如果有一个目录“/foo/bar”,在输出目录“bar”的INode时,它的父目录“/foo”会保存在parentPrefix中,当输入“/foo/bar”下的INode时,parentPrefix中保存的内容会变成“/foor/bar”。
假设当前调用的current参数值是目录"/foo",saveImage()会先循环输出它的所有子节点:
一、设置缓冲区位置,操作完成后缓冲区中内容为“/foo”
二、将当前的目录通过ByteBuffer.put()追加到缓冲区中,这时,缓冲区中保存的是“/foo/bar”。
三、调用saveINode2Image()输出节点信息。
saveImage()方法代码如下:
/**
* Save file tree image starting from the given root.
* This is a recursive procedure, which first saves all children of
* a current directory and then moves inside the sub-directories.
*/
private static void saveImage(ByteBuffer parentPrefix,
int prefixLength,
INodeDirectory current,
DataOutputStream out) throws IOException {
int newPrefixLength = prefixLength;
if (current.getChildrenRaw() == null)
return;
for(INode child : current.getChildren()) {
// print all children first
parentPrefix.position(prefixLength);
parentPrefix.put(PATH_SEPARATOR).put(child.getLocalNameBytes());
saveINode2Image(parentPrefix, child, out);
}
for(INode child : current.getChildren()) {
if(!child.isDirectory())
continue;
parentPrefix.position(prefixLength);
parentPrefix.put(PATH_SEPARATOR).put(child.getLocalNameBytes());
newPrefixLength = parentPrefix.position();
saveImage(parentPrefix, newPrefixLength, (INodeDirectory)child, out);
}
parentPrefix.position(prefixLength);
}
saveINode2Image()方法如下:
/*
* Save one inode's attributes to the image.
*/
private static void saveINode2Image(ByteBuffer name,
INode node,
DataOutputStream out) throws IOException {
int nameLen = name.position();
out.writeShort(nameLen);
out.write(name.array(), name.arrayOffset(), nameLen);
if (!node.isDirectory()) { // write file inode
INodeFile fileINode = (INodeFile)node;
out.writeShort(fileINode.getReplication());
out.writeLong(fileINode.getModificationTime());
out.writeLong(fileINode.getAccessTime());
out.writeLong(fileINode.getPreferredBlockSize());
Block[] blocks = fileINode.getBlocks();
out.writeInt(blocks.length);
for (Block blk : blocks)
blk.write(out);
FILE_PERM.fromShort(fileINode.getFsPermissionShort());
PermissionStatus.write(out, fileINode.getUserName(),
fileINode.getGroupName(),
FILE_PERM);
} else { // write directory inode
out.writeShort(0); // replication
out.writeLong(node.getModificationTime());
out.writeLong(0); // access time
out.writeLong(0); // preferred block size
out.writeInt(-1); // # of blocks
out.writeLong(node.getNsQuota());
out.writeLong(node.getDsQuota());
FILE_PERM.fromShort(node.getFsPermissionShort());
PermissionStatus.write(out, node.getUserName(),
node.getGroupName(),
FILE_PERM);
}
}
FSImage.saveFSImage()利用上述两个方法输出目录树以后,会将当前系统中为写打开的文件,即INodeFileUnderConstruction()对应的文件也写到命名空间镜像中。
FSNamesystem.saveFilesUnderConstruction()方法会遍历租约管理器中保存的租约,并使用FSImage.writeINodeUnderConstruction()输出处于构建状态的文件索引节点。代码如下:
/**
* Serializes leases.
*/
void saveFilesUnderConstruction(DataOutputStream out) throws IOException {
synchronized (leaseManager) {
out.writeInt(leaseManager.countPath()); // write the size
for (Lease lease : leaseManager.getSortedLeases()) {
for(String path : lease.getPaths()) {
// verify that path exists in namespace
INode node = dir.getFileINode(path);
if (node == null) {
throw new IOException("saveLeases found path " + path +
" but no matching entry in namespace.");
}
if (!node.isUnderConstruction()) {
throw new IOException("saveLeases found path " + path +
" but is not under construction.");
}
INodeFileUnderConstruction cons = (INodeFileUnderConstruction) node;
FSImage.writeINodeUnderConstruction(out, cons, path);
}
}
}
}
4、编辑日志数据保存
命名空间镜像FSImage作为磁盘上的文件,很难和名字节点内存中的元数据时时刻刻保存一致,为了提高元数据的可靠性,HDFS将对元数据的修改保存在编辑日志中,编辑日志和命名空间镜像一起,确定了当前时刻文件系统的元数据。
当HDFS运行时,需要在日志中记录没一个修改名字节点第一关系的事件,所以可以把日志抽象成一个只允许添加数据的数据输出流。EditLogOutputStream抽象了日志输出流,它的一个子类EditLogFileOutputStrem,用于将日志写往磁盘文件。EditLogOutputStrem代码如下:
/**
* A generic abstract class to support journaling of edits logs into
* a persistent storage.
*/
abstract class EditLogOutputStream extends OutputStream {
// these are statistics counters
private long numSync; // number of sync(s) to disk
private long totalTimeSync; // total time to sync
EditLogOutputStream() throws IOException {
numSync = totalTimeSync = 0;
}
/**
* Get this stream name.
*
* @return name of the stream
*/
abstract String getName();
/** {@inheritDoc} */
abstract public void write(int b) throws IOException;
/**
* Write edits log record into the stream.
* The record is represented by operation name and
* an array of Writable arguments.
*
* @param op operation
* @param writables array of Writable arguments
* @throws IOException
*/
abstract void write(byte op, Writable ... writables) throws IOException;
/**
* Create and initialize new edits log storage.
*
* @throws IOException
*/
abstract void create() throws IOException;
/** {@inheritDoc} */
abstract public void close() throws IOException;
/**
* All data that has been written to the stream so far will be flushed.
* New data can be still written to the stream while flushing is performed.
*/
abstract void setReadyToFlush() throws IOException;
/**
* Flush and sync all data that is ready to be flush
* {@link #setReadyToFlush()} into underlying persistent store.
* @throws IOException
*/
abstract protected void flushAndSync() throws IOException;
/**
* Flush data to persistent store.
* Collect sync metrics.
*/
public void flush() throws IOException {
numSync++;
long start = FSNamesystem.now();
flushAndSync();
long end = FSNamesystem.now();
totalTimeSync += (end - start);
}
/**
* Return the size of the current edits log.
* Length is used to check when it is large enough to start a checkpoint.
*/
abstract long length() throws IOException;
/**
* Return total time spent in {@link #flushAndSync()}
*/
long getTotalSyncTime() {
return totalTimeSync;
}
/**
* Return number of calls to {@link #flushAndSync()}
*/
long getNumSync() {
return numSync;
}
}
编辑日志文件输出流EditLogFileOutputStream实现了EditLogOutputStream。
EditLogFileOutputStream拥有两个工作缓冲区:
bufCurrent:日志写入缓冲区。
bufReady:写文件缓冲区。
通过wirte()输出的日志记录会写到缓冲区bufCurrent中,当bufCurrent中的内容需要写往文件时,EditFileOutputStream会交换两个缓冲区,原来的日志写入缓冲区编程了写文件缓冲区,而原来的文件缓冲区则变成日志写入缓冲区。代码如下:
/**
* An implementation of the abstract class {@link EditLogOutputStream},
* which stores edits in a local file.
*/
static private class EditLogFileOutputStream extends EditLogOutputStream {
private File file;
private FileOutputStream fp; // file stream for storing edit logs
private FileChannel fc; // channel of the file stream for sync
private DataOutputBuffer bufCurrent; // current buffer for writing
private DataOutputBuffer bufReady; // buffer ready for flushing
static ByteBuffer fill = ByteBuffer.allocateDirect(512); // preallocation
EditLogFileOutputStream(File name) throws IOException {
super();
file = name;
bufCurrent = new DataOutputBuffer(sizeFlushBuffer);
bufReady = new DataOutputBuffer(sizeFlushBuffer);
RandomAccessFile rp = new RandomAccessFile(name, "rw");
fp = new FileOutputStream(rp.getFD()); // open for append
fc = rp.getChannel();
fc.position(fc.size());
}
@Override
String getName() {
return file.getPath();
}
/** {@inheritDoc} */
@Override
public void write(int b) throws IOException {
bufCurrent.write(b);
}
/** {@inheritDoc} */
@Override
void write(byte op, Writable ... writables) throws IOException {
write(op);
for(Writable w : writables) {
w.write(bufCurrent);
}
}
/**
* Create empty edits logs file.
*/
@Override
void create() throws IOException {
fc.truncate(0);
fc.position(0);
bufCurrent.writeInt(FSConstants.LAYOUT_VERSION);
setReadyToFlush();
flush();
}
@Override
public void close() throws IOException {
// close should have been called after all pending transactions
// have been flushed & synced.
int bufSize = bufCurrent.size();
if (bufSize != 0) {
throw new IOException("FSEditStream has " + bufSize +
" bytes still to be flushed and cannot " +
"be closed.");
}
bufCurrent.close();
bufReady.close();
// remove the last INVALID marker from transaction log.
fc.truncate(fc.position());
fp.close();
bufCurrent = bufReady = null;
}
/**
* All data that has been written to the stream so far will be flushed.
* New data can be still written to the stream while flushing is performed.
*/
@Override
void setReadyToFlush() throws IOException {
assert bufReady.size() == 0 : "previous data is not flushed yet";
write(OP_INVALID); // insert end-of-file marker
DataOutputBuffer tmp = bufReady;
bufReady = bufCurrent;
bufCurrent = tmp;
}
/**
* Flush ready buffer to persistent store.
* currentBuffer is not flushed as it accumulates new log records
* while readyBuffer will be flushed and synced.
*/
@Override
protected void flushAndSync() throws IOException {
preallocate(); // preallocate file if necessary
bufReady.writeTo(fp); // write data to file
bufReady.reset(); // erase all data in the buffer
fc.force(false); // metadata updates not needed because of preallocation
fc.position(fc.position()-1); // skip back the end-of-file marker
}
/**
* Return the size of the current edit log including buffered data.
*/
@Override
long length() throws IOException {
// file size + size of both buffers
return fc.size() + bufReady.size() + bufCurrent.size();
}
// allocate a big chunk of data
private void preallocate() throws IOException {
long position = fc.position();
if (position + 4096 >= fc.size()) {
FSNamesystem.LOG.debug("Preallocating Edit log, current size " +
fc.size());
long newsize = position + 1024*1024; // 1MB
fill.position(0);
int written = fc.write(fill, newsize);
FSNamesystem.LOG.debug("Edit log size is now " + fc.size() +
" written " + written + " bytes " +
" at offset " + newsize);
}
}
/**
* Returns the file associated with this stream
*/
File getFile() {
return file;
}
}
5、读取命名空间镜像和编辑日志数据
FSImage.loadFSImage()读取命名空间镜像,并将它包含的元数据添加/更新到内存元数据中,FSEditLog.loadFSEdits()则加载编辑日志,重放并应用日志记录的操作,以获得元数据在某个时刻的状态。loadFSImage()方法如下:
/**
* Choose latest image from one of the directories,
* load it and merge with the edits from that directory.
*
* @return whether the image should be saved
* @throws IOException
*/
boolean loadFSImage() throws IOException {
// Now check all curFiles and see which is the newest
long latestNameCheckpointTime = Long.MIN_VALUE;
long latestEditsCheckpointTime = Long.MIN_VALUE;
StorageDirectory latestNameSD = null;
StorageDirectory latestEditsSD = null;
boolean needToSave = false;
isUpgradeFinalized = true;
Collection<String> imageDirs = new ArrayList<String>();
Collection<String> editsDirs = new ArrayList<String>();
for (Iterator<StorageDirectory> it = dirIterator(); it.hasNext();) {
StorageDirectory sd = it.next();
if (!sd.getVersionFile().exists()) {
needToSave |= true;
continue; // some of them might have just been formatted
}
boolean imageExists = false, editsExists = false;
if (sd.getStorageDirType().isOfType(NameNodeDirType.IMAGE)) {
imageExists = getImageFile(sd, NameNodeFile.IMAGE).exists();
imageDirs.add(sd.getRoot().getCanonicalPath());
}
if (sd.getStorageDirType().isOfType(NameNodeDirType.EDITS)) {
editsExists = getImageFile(sd, NameNodeFile.EDITS).exists();
editsDirs.add(sd.getRoot().getCanonicalPath());
}
checkpointTime = readCheckpointTime(sd);
if ((checkpointTime != Long.MIN_VALUE) &&
((checkpointTime != latestNameCheckpointTime) ||
(checkpointTime != latestEditsCheckpointTime))) {
// Force saving of new image if checkpoint time
// is not same in all of the storage directories.
needToSave |= true;
}
if (sd.getStorageDirType().isOfType(NameNodeDirType.IMAGE) &&
(latestNameCheckpointTime < checkpointTime) && imageExists) {
latestNameCheckpointTime = checkpointTime;
latestNameSD = sd;
}
if (sd.getStorageDirType().isOfType(NameNodeDirType.EDITS) &&
(latestEditsCheckpointTime < checkpointTime) && editsExists) {
latestEditsCheckpointTime = checkpointTime;
latestEditsSD = sd;
}
if (checkpointTime <= 0L)
needToSave |= true;
// set finalized flag
isUpgradeFinalized = isUpgradeFinalized && !sd.getPreviousDir().exists();
}
// We should have at least one image and one edits dirs
if (latestNameSD == null)
throw new IOException("Image file is not found in " + imageDirs);
if (latestEditsSD == null)
throw new IOException("Edits file is not found in " + editsDirs);
// Make sure we are loading image and edits from same checkpoint
if (latestNameCheckpointTime > latestEditsCheckpointTime
&& latestNameSD != latestEditsSD
&& latestNameSD.getStorageDirType() == NameNodeDirType.IMAGE
&& latestEditsSD.getStorageDirType() == NameNodeDirType.EDITS) {
// This is a rare failure when NN has image-only and edits-only
// storage directories, and fails right after saving images,
// in some of the storage directories, but before purging edits.
// See -NOTE- in saveNamespace().
LOG.error("This is a rare failure scenario!!!");
LOG.error("Image checkpoint time " + latestNameCheckpointTime +
" > edits checkpoint time " + latestEditsCheckpointTime);
LOG.error("Name-node will treat the image as the latest state of " +
"the namespace. Old edits will be discarded.");
} else if (latestNameCheckpointTime != latestEditsCheckpointTime)
throw new IOException("Inconsistent storage detected, " +
"image and edits checkpoint times do not match. " +
"image checkpoint time = " + latestNameCheckpointTime +
"edits checkpoint time = " + latestEditsCheckpointTime);
// Recover from previous interrrupted checkpoint if any
needToSave |= recoverInterruptedCheckpoint(latestNameSD, latestEditsSD);
long startTime = FSNamesystem.now();
long imageSize = getImageFile(latestNameSD, NameNodeFile.IMAGE).length();
//
// Load in bits
//
latestNameSD.read();
needToSave |= loadFSImage(getImageFile(latestNameSD, NameNodeFile.IMAGE));
LOG.info("Image file of size " + imageSize + " loaded in "
+ (FSNamesystem.now() - startTime)/1000 + " seconds.");
// Load latest edits
if (latestNameCheckpointTime > latestEditsCheckpointTime)
// the image is already current, discard edits
needToSave |= true;
else // latestNameCheckpointTime == latestEditsCheckpointTime
needToSave |= (loadFSEdits(latestEditsSD) > 0);
return needToSave;
}
上面的loadFSImage()方法中,for语句包含的代码用于读入文件/目录的INode信息,并通过FSNamesystem.addToParent()方法,将节点添加到目录树中个。该方法会根据传入的参数,构造相应的INodeDirectoryWithQuota或INodeFile对象,并将对象插入到目录树的相应位置中。
命名空间镜像尾部保存的租约信息和安全相关信息,分别有loadFilesUnderConstruction()和loadSecretManagerState()读入。租约信息读入时,需要更新目录树上的信息,将原来的INodeFile对象替换成INodeFileUnderConstruction对象;同时,也需要更新租约管理器中的信息,如下所示:
private void loadFilesUnderConstruction(int version, DataInputStream in,
FSNamesystem fs) throws IOException {
FSDirectory fsDir = fs.dir;
if (version > -13) // pre lease image version
return;
int size = in.readInt();
LOG.info("Number of files under construction = " + size);
for (int i = 0; i < size; i++) {
INodeFileUnderConstruction cons = readINodeUnderConstruction(in);
// verify that file exists in namespace
String path = cons.getLocalName();
INode old = fsDir.getFileINode(path);
if (old == null) {
throw new IOException("Found lease for non-existent file " + path);
}
if (old.isDirectory()) {
throw new IOException("Found lease for directory " + path);
}
INodeFile oldnode = (INodeFile) old;
fsDir.replaceNode(path, oldnode, cons);
fs.leaseManager.addLease(cons.clientName, path);
}
}
通过loadFSImage()读取命名空间镜像后,内存中的名字节点第一关系信息只包含了保存镜像的那一刻的内容,还需要假造后续对元数据的修改,并编辑日志中的内容,才能完全恢复元数据。FSEditLog.loadFSEdits()用于加载并应用日志,它需要一个EditLogInputStream实例,即EditLogFileInputStream对象。方法loadFSEdits()冗长,大部分的实现用于根据日志记录的操作码和操作参数,调用FSDirectory中的对应方法,修改内存元信息。代码如下:
/**
* Load an edit log, and apply the changes to the in-memory structure
* This is where we apply edits that we've been writing to disk all
* along.
*/
static int loadFSEdits(EditLogInputStream edits) throws IOException {
FSNamesystem fsNamesys = FSNamesystem.getFSNamesystem();
FSDirectory fsDir = fsNamesys.dir;
int numEdits = 0;
int logVersion = 0;
String clientName = null;
String clientMachine = null;
String path = null;
int numOpAdd = 0, numOpClose = 0, numOpDelete = 0,
numOpRename = 0, numOpSetRepl = 0, numOpMkDir = 0,
numOpSetPerm = 0, numOpSetOwner = 0, numOpSetGenStamp = 0,
numOpTimes = 0, numOpGetDelegationToken = 0,
numOpRenewDelegationToken = 0, numOpCancelDelegationToken = 0,
numOpUpdateMasterKey = 0, numOpOther = 0;
long startTime = FSNamesystem.now();
DataInputStream in = new DataInputStream(new BufferedInputStream(edits));
try {
// Read log file version. Could be missing.
in.mark(4);
// If edits log is greater than 2G, available method will return negative
// numbers, so we avoid having to call available
boolean available = true;
try {
logVersion = in.readByte();
} catch (EOFException e) {
available = false;
}
if (available) {
in.reset();
logVersion = in.readInt();
if (logVersion < FSConstants.LAYOUT_VERSION) // future version
throw new IOException(
"Unexpected version of the file system log file: "
+ logVersion + ". Current version = "
+ FSConstants.LAYOUT_VERSION + ".");
}
assert logVersion <= Storage.LAST_UPGRADABLE_LAYOUT_VERSION :
"Unsupported version " + logVersion;
while (true) {
long timestamp = 0;
long mtime = 0;
long atime = 0;
long blockSize = 0;
byte opcode = -1;
try {
opcode = in.readByte();
if (opcode == OP_INVALID) {
FSNamesystem.LOG.info("Invalid opcode, reached end of edit log " +
"Number of transactions found " + numEdits);
break; // no more transactions
}
} catch (EOFException e) {
break; // no more transactions
}
numEdits++;
switch (opcode) {
case OP_ADD:
case OP_CLOSE: {
// versions > 0 support per file replication
// get name and replication
int length = in.readInt();
if (-7 == logVersion && length != 3||
-17 < logVersion && logVersion < -7 && length != 4 ||
logVersion <= -17 && length != 5) {
throw new IOException("Incorrect data format." +
" logVersion is " + logVersion +
" but writables.length is " +
length + ". ");
}
path = FSImage.readString(in);
short replication = adjustReplication(readShort(in));
mtime = readLong(in);
if (logVersion <= -17) {
atime = readLong(in);
}
if (logVersion < -7) {
blockSize = readLong(in);
}
// get blocks
Block blocks[] = null;
if (logVersion <= -14) {
blocks = readBlocks(in);
} else {
BlockTwo oldblk = new BlockTwo();
int num = in.readInt();
blocks = new Block[num];
for (int i = 0; i < num; i++) {
oldblk.readFields(in);
blocks[i] = new Block(oldblk.blkid, oldblk.len,
Block.GRANDFATHER_GENERATION_STAMP);
}
}
// Older versions of HDFS does not store the block size in inode.
// If the file has more than one block, use the size of the
// first block as the blocksize. Otherwise use the default
// block size.
if (-8 <= logVersion && blockSize == 0) {
if (blocks.length > 1) {
blockSize = blocks[0].getNumBytes();
} else {
long first = ((blocks.length == 1)? blocks[0].getNumBytes(): 0);
blockSize = Math.max(fsNamesys.getDefaultBlockSize(), first);
}
}
PermissionStatus permissions = fsNamesys.getUpgradePermission();
if (logVersion <= -11) {
permissions = PermissionStatus.read(in);
}
// clientname, clientMachine and block locations of last block.
if (opcode == OP_ADD && logVersion <= -12) {
clientName = FSImage.readString(in);
clientMachine = FSImage.readString(in);
if (-13 <= logVersion) {
readDatanodeDescriptorArray(in);
}
} else {
clientName = "";
clientMachine = "";
}
// The open lease transaction re-creates a file if necessary.
// Delete the file if it already exists.
if (FSNamesystem.LOG.isDebugEnabled()) {
FSNamesystem.LOG.debug(opcode + ": " + path +
" numblocks : " + blocks.length +
" clientHolder " + clientName +
" clientMachine " + clientMachine);
}
fsDir.unprotectedDelete(path, mtime);
// add to the file tree
INodeFile node = (INodeFile)fsDir.unprotectedAddFile(
path, permissions,
blocks, replication,
mtime, atime, blockSize);
if (opcode == OP_ADD) {
numOpAdd++;
//
// Replace current node with a INodeUnderConstruction.
// Recreate in-memory lease record.
//
INodeFileUnderConstruction cons = new INodeFileUnderConstruction(
node.getLocalNameBytes(),
node.getReplication(),
node.getModificationTime(),
node.getPreferredBlockSize(),
node.getBlocks(),
node.getPermissionStatus(),
clientName,
clientMachine,
null);
fsDir.replaceNode(path, node, cons);
fsNamesys.leaseManager.addLease(cons.clientName, path);
}
break;
}
case OP_SET_REPLICATION: {
numOpSetRepl++;
path = FSImage.readString(in);
short replication = adjustReplication(readShort(in));
fsDir.unprotectedSetReplication(path, replication, null);
break;
}
case OP_RENAME: {
numOpRename++;
int length = in.readInt();
if (length != 3) {
throw new IOException("Incorrect data format. "
+ "Mkdir operation.");
}
String s = FSImage.readString(in);
String d = FSImage.readString(in);
timestamp = readLong(in);
HdfsFileStatus dinfo = fsDir.getFileInfo(d);
fsDir.unprotectedRenameTo(s, d, timestamp);
fsNamesys.changeLease(s, d, dinfo);
break;
}
case OP_DELETE: {
numOpDelete++;
int length = in.readInt();
if (length != 2) {
throw new IOException("Incorrect data format. "
+ "delete operation.");
}
path = FSImage.readString(in);
timestamp = readLong(in);
fsDir.unprotectedDelete(path, timestamp);
break;
}
case OP_MKDIR: {
numOpMkDir++;
PermissionStatus permissions = fsNamesys.getUpgradePermission();
int length = in.readInt();
if (-17 < logVersion && length != 2 ||
logVersion <= -17 && length != 3) {
throw new IOException("Incorrect data format. "
+ "Mkdir operation.");
}
path = FSImage.readString(in);
timestamp = readLong(in);
// The disk format stores atimes for directories as well.
// However, currently this is not being updated/used because of
// performance reasons.
if (logVersion <= -17) {
atime = readLong(in);
}
if (logVersion <= -11) {
permissions = PermissionStatus.read(in);
}
fsDir.unprotectedMkdir(path, permissions, timestamp);
break;
}
case OP_SET_GENSTAMP: {
numOpSetGenStamp++;
long lw = in.readLong();
fsDir.namesystem.setGenerationStamp(lw);
break;
}
case OP_DATANODE_ADD: {
numOpOther++;
FSImage.DatanodeImage nodeimage = new FSImage.DatanodeImage();
nodeimage.readFields(in);
//Datnodes are not persistent any more.
break;
}
case OP_DATANODE_REMOVE: {
numOpOther++;
DatanodeID nodeID = new DatanodeID();
nodeID.readFields(in);
//Datanodes are not persistent any more.
break;
}
case OP_SET_PERMISSIONS: {
numOpSetPerm++;
if (logVersion > -11)
throw new IOException("Unexpected opcode " + opcode
+ " for version " + logVersion);
fsDir.unprotectedSetPermission(
FSImage.readString(in), FsPermission.read(in));
break;
}
case OP_SET_OWNER: {
numOpSetOwner++;
if (logVersion > -11)
throw new IOException("Unexpected opcode " + opcode
+ " for version " + logVersion);
fsDir.unprotectedSetOwner(FSImage.readString(in),
FSImage.readString_EmptyAsNull(in),
FSImage.readString_EmptyAsNull(in));
break;
}
case OP_SET_NS_QUOTA: {
if (logVersion > -16) {
throw new IOException("Unexpected opcode " + opcode
+ " for version " + logVersion);
}
fsDir.unprotectedSetQuota(FSImage.readString(in),
readLongWritable(in),
FSConstants.QUOTA_DONT_SET);
break;
}
case OP_CLEAR_NS_QUOTA: {
if (logVersion > -16) {
throw new IOException("Unexpected opcode " + opcode
+ " for version " + logVersion);
}
fsDir.unprotectedSetQuota(FSImage.readString(in),
FSConstants.QUOTA_RESET,
FSConstants.QUOTA_DONT_SET);
break;
}
case OP_SET_QUOTA:
fsDir.unprotectedSetQuota(FSImage.readString(in),
readLongWritable(in),
readLongWritable(in));
break;
case OP_TIMES: {
numOpTimes++;
int length = in.readInt();
if (length != 3) {
throw new IOException("Incorrect data format. "
+ "times operation.");
}
path = FSImage.readString(in);
mtime = readLong(in);
atime = readLong(in);
fsDir.unprotectedSetTimes(path, mtime, atime, true);
break;
}
case OP_GET_DELEGATION_TOKEN: {
if (logVersion > -19) {
throw new IOException("Unexpected opcode " + opcode
+ " for version " + logVersion);
}
numOpGetDelegationToken++;
DelegationTokenIdentifier delegationTokenId =
new DelegationTokenIdentifier();
delegationTokenId.readFields(in);
long expiryTime = readLong(in);
fsNamesys.getDelegationTokenSecretManager()
.addPersistedDelegationToken(delegationTokenId, expiryTime);
break;
}
case OP_RENEW_DELEGATION_TOKEN: {
if (logVersion > -19) {
throw new IOException("Unexpected opcode " + opcode
+ " for version " + logVersion);
}
numOpRenewDelegationToken++;
DelegationTokenIdentifier delegationTokenId =
new DelegationTokenIdentifier();
delegationTokenId.readFields(in);
long expiryTime = readLong(in);
fsNamesys.getDelegationTokenSecretManager()
.updatePersistedTokenRenewal(delegationTokenId, expiryTime);
break;
}
case OP_CANCEL_DELEGATION_TOKEN: {
if (logVersion > -19) {
throw new IOException("Unexpected opcode " + opcode
+ " for version " + logVersion);
}
numOpCancelDelegationToken++;
DelegationTokenIdentifier delegationTokenId =
new DelegationTokenIdentifier();
delegationTokenId.readFields(in);
fsNamesys.getDelegationTokenSecretManager()
.updatePersistedTokenCancellation(delegationTokenId);
break;
}
case OP_UPDATE_MASTER_KEY: {
if (logVersion > -19) {
throw new IOException("Unexpected opcode " + opcode
+ " for version " + logVersion);
}
numOpUpdateMasterKey++;
DelegationKey delegationKey = new DelegationKey();
delegationKey.readFields(in);
fsNamesys.getDelegationTokenSecretManager().updatePersistedMasterKey(
delegationKey);
break;
}
default: {
throw new IOException("Never seen opcode " + opcode);
}
}
}
} catch (IOException ex) {
// Failed to load 0.20.203 version edits during upgrade. This version has
// conflicting opcodes with the later releases. The editlog must be
// emptied by restarting the namenode, before proceeding with the upgrade.
if (Storage.is203LayoutVersion(logVersion) &&
logVersion != FSConstants.LAYOUT_VERSION) {
String msg = "During upgrade, failed to load the editlog version " +
logVersion + " from release 0.20.203. Please go back to the old " +
" release and restart the namenode. This empties the editlog " +
" and saves the namespace. Resume the upgrade after this step.";
throw new IOException(msg, ex);
} else {
throw ex;
}
} finally {
in.close();
}
FSImage.LOG.info("Edits file " + edits.getName()
+ " of size " + edits.length() + " edits # " + numEdits
+ " loaded in " + (FSNamesystem.now()-startTime)/1000 + " seconds.");
if (FSImage.LOG.isDebugEnabled()) {
FSImage.LOG.debug("numOpAdd = " + numOpAdd + " numOpClose = " + numOpClose
+ " numOpDelete = " + numOpDelete + " numOpRename = " + numOpRename
+ " numOpSetRepl = " + numOpSetRepl + " numOpMkDir = " + numOpMkDir
+ " numOpSetPerm = " + numOpSetPerm
+ " numOpSetOwner = " + numOpSetOwner
+ " numOpSetGenStamp = " + numOpSetGenStamp
+ " numOpTimes = " + numOpTimes
+ " numOpGetDelegationToken = " + numOpGetDelegationToken
+ " numOpRenewDelegationToken = " + numOpRenewDelegationToken
+ " numOpCancelDelegationToken = " + numOpCancelDelegationToken
+ " numOpUpdateMasterKey = " + numOpUpdateMasterKey
+ " numOpOther = " + numOpOther);
}
if (logVersion != FSConstants.LAYOUT_VERSION) // other version
numEdits++; // save this image asap
return numEdits;
}
第二名字节点
通过FSImage.saveFSImage()输出命名空间镜像是一个非常消耗资源的操作,在名字节点上周期性使用该方法保存元数据,需要让名字节点处于只读状态,这将严重影响运行在HDFS上的业务。如果只使用编辑日志,记录文件系统上的元数据的变化,随着系统的运行,日志文件将会变的越来越大,虽然在名字节点运行期间这不会对系统造成影响,但如果名字节点重新启动,就需要花很长的时间执行FSEditLog.loadFSEdits(),影响名字节点的可用性。解决的方法是运行一个第二名字节点,由它定期地获取并合并名字节点上的命名空间镜像和编辑日志,生成新的命名空间镜像(又称元数据检查点),然后上传并替换名字节点原有镜像,清空编辑日志。
检查点产生的流程,若下所示:(该方法实现在SecondaryNameNode.doCheckPoint()中)
/**
* Create a new checkpoint
*/
void doCheckpoint() throws IOException {
// Do the required initialization of the merge work area.
startCheckpoint();
// Tell the namenode to start logging transactions in a new edit file
// Retuns a token that would be used to upload the merged image.
CheckpointSignature sig = (CheckpointSignature)namenode.rollEditLog();
// error simulation code for junit test
if (ErrorSimulator.getErrorSimulation(0)) {
throw new IOException("Simulating error0 " +
"after creating edits.new");
}
downloadCheckpointFiles(sig); // Fetch fsimage and edits
doMerge(sig); // Do the merge
//
// Upload the new image into the NameNode. Then tell the Namenode
// to make this new uploaded image as the most current image.
//
putFSImage(sig);
// error simulation code for junit test
if (ErrorSimulator.getErrorSimulation(1)) {
throw new IOException("Simulating error1 " +
"after uploading new image to NameNode");
}
namenode.rollFsImage();
checkpointImage.endCheckpoint();
LOG.warn("Checkpoint done. New Image Size: "
+ checkpointImage.getFsImageName().length());
}
对上述过程进行展开,第二名字节点工作是,命名空间镜像合并前后的数据都保存在磁盘上,和其他节点一样,在磁盘上也维护着一定的目录文件结构,所以,doCheckpoint()需要先保证工作区处于恰当的状态。接下来,第二名字节点调用远程方法NamenodeProtocol.rollEditLog(),让名字节点进行准备,这时,名字节点会关闭到编辑日志文件“edits”的输出,后续的新日志记录写完“edits.new”文件。
接下来,第二名字节点通过HTTP协议下载名字节点上的命名空间镜像“fsimage”和编辑日志“edits”到本地工作区,并进行合并;合并后的内存元数据会保存到第二名字节点后的磁盘上,形成新的镜像文件;然后通过putFSImage()方法。使用HTTP协议上传新命名空间镜像到名字节点。新命名空间镜像成功上传后,第二名字节点再次通过远程接口NamenodeProtocol,使用rollFsIamge()方法通知名字节点。这时,名字节点会使用新命名空间镜像作为当前镜像,并将“edits.new”改名为“edits”,这样新命名空间镜像和编辑日志“edits”,又一起定义了当前内存元数据的真实组织情况。
版权申明:本文部分摘自【蔡斌、陈湘萍】所著【Hadoop技术内幕 深入解析Hadoop Common和HDFS架构设计与实现原理】一书,仅作为学习笔记,用于技术交流,其商业版权由原作者保留,推荐大家购买图书研究,转载请保留原作者,谢谢!