DataNode内幕（一）：数据存储和它的小伙伴们

最新推荐文章于 2023-03-31 23:54:22 发布

天然呆的技术博客

最新推荐文章于 2023-03-31 23:54:22 发布

阅读量1k

点赞数

分类专栏： hadoop技术内幕

本文链接：https://blog.csdn.net/u013494310/article/details/19236989

版权

hadoop技术内幕专栏收录该内容

7 篇文章 1 订阅

订阅专栏

大家好，今天我们开始学习DataNode，一个相对于分布式客户端，较为复杂的一个hadoop集群的VIP会员
DataNode：数据节点，在分布式集群中和真正和数据打交道的一个对象，它的职责就是存储数据和读取数据，维护它所管理的数据
节点：简单的说，一个网络拓扑节点，在一个集群内，我们将一台计算机或者一台虚拟机称之为一个节点，同时一台机器（计算机和虚拟机的统称）可以属于多个集群，对于交叉集群部署（一台机器上被部署了多个集群）的情况，等会我会详细介绍如何区分这个节点到底属于哪个集群。而且hadoop是如何解决交叉部署中出现"争夺资源"的情况
首先大致了解下在DataNode工作过程中，与之相关的一些成员。
这些成员从作用来分大致分为三类：
A：与DataNode存储有关
B：与文件系统数据集的工作机制有关
C：与DataNode工作线程有关

设计背景：
根据软件开发的原则，数据节点的业务逻辑不会在数据节点的文件结构上直接操作，当需要对磁盘上的数据进行操作时，业务逻辑只需要调用管理这些文件结构的对象提供的服务即可，数据节点的文件结构管理包括两部分内容：数据存储DataStorage和文件系统数据集FSDataset。

A：成员介绍：
a：DataStorage：其父是抽象类Storage,其祖父是StorageInfo

-----------------------------------------------------------------------------------------------------------
StorageInfo
位于 org.apache.hadoop.hdfs.server.common 包

/**存储系统信息结构的版本号**/

public int layoutVersion;

/**存储系统的唯一标识符**/

public int namespaceID;

/**该存储系统信息的创建时间**/

public long cTime;

//构造器

public StorageInfo () {

this(0, 0, 0L);

}
---------------------------------------------------------------------------------------------------------
Storage
位于 org.apache.hadoop.hdfs.server.common 包
较为重要的成员介绍：

/**用此后缀名表示独占该文件**/

private static final String STORAGE_FILE_LOCK = "in_use.lock";

protected static final String STORAGE_FILE_VERSION = "VERSION";

public static final String STORAGE_DIR_CURRENT = "current";

private static final String STORAGE_DIR_PREVIOUS = "previous";

private static final String STORAGE_TMP_REMOVED = "removed.tmp";

private static final String STORAGE_TMP_PREVIOUS = "previous.tmp";

private static final String STORAGE_TMP_FINALIZED = "finalized.tmp";

private static final String STORAGE_TMP_LAST_CKPT = "lastcheckpoint.tmp";

private static final String STORAGE_PREVIOUS_CKPT = "previous.checkpoint";

/**标识是NameNode还是DataNode**/

private NodeType storageType; // Type of the node using this storage

protected List<StorageDirectory> storageDirs = new ArrayList<StorageDirectory>();
-------------------------------------------------
内部类 StorageDirectory

/**

* 提供存储目录上的一些通用操作

* @author Administrator

public class StorageDirectory {

/**保存着存储目录的根**/

File root;

/**文件锁,这里使用的是独占文件锁**/

FileLock lock;

/**保存目录对应的类型**/

StorageDirType dirType;

----------------
两个构造器：

public StorageDirectory(File dir) {

// default dirType is null

this(dir, null);

}

public StorageDirectory(File dir, StorageDirType dirType) {

this.root = dir;

this.lock = null;

this.dirType = dirType;

}
-------------------
public void lock() throws IOException {

this.lock = tryLock();

if (lock == null) {

String msg = "Cannot lock storage " + this.root

+ ". The directory is already locked.";

LOG.info(msg);

throw new IOException(msg);

}

----------------------

/**

* 对文件的加锁，防止出现交叉部署中争夺资源的情况

* @return

* @throws IOException

FileLock tryLock() throws IOException {
// 在存储目录的根目录下，新建一个锁文件，注意，这里的根与linux系统里面的根含义不同，这里指存储目录的根，由配置文件指定

File lockF = new File(root, STORAGE_FILE_LOCK);

/**独占该文件的数据节点停止时，删除此"in_use.lock"文件**/

lockF.deleteOnExit();

/**可以访问任意文件的任意地方，提供文件的随机读写功能**/

RandomAccessFile file = new RandomAccessFile(lockF, "rws");

FileLock res = null;

try {

//锁住文件，独占访问，

//采用file.getChannel().tryLock()可以避免删除文件或者移动文件造成的"in_use.lock"文件丢失后，tryLocK()程序判断逻辑失效

res = file.getChannel().tryLock();

} catch(OverlappingFileLockException oe) {

file.close();

return null;

} catch(IOException e) {

LOG.info(StringUtils.stringifyException(e));

file.close();

throw e;

}

return res;

}

-----------------

/**

* Unlock storage.

* @throws IOException

public void unlock() throws IOException {

if (this.lock == null)

return;

this.lock.release();

lock.channel().close();

lock = null;

}

------------------------------------------------------------------------------------------------------

DataStorage
位于 org.apache.hadoop.hdfs.server.datanode 包下

重要成员介绍：

final static String BLOCK_SUBDIR_PREFIX = "subdir";

final static String BLOCK_FILE_PREFIX = "blk_";

final static String COPY_FILE_PREFIX = "dncp_";
--------------
三个构造器

DataStorage() {

super(NodeType.DATA_NODE);

storageID = "";

}

DataStorage(int nsID, long cT, String strgID) {

super(NodeType.DATA_NODE, nsID, cT);

this.storageID = strgID;

}

public DataStorage(StorageInfo storageInfo, String strgID) {

super(NodeType.DATA_NODE, storageInfo);

this.storageID = strgID;

}
----------------------------------
/**

* 数据节点第一次启动时，会调用DataStorage.format()创建存储目录结构，

* ，如果数据节点管理多个目录，这个方法会调用多次，在不同的目录下创建节点文件结构

* @param sd

* @param nsInfo 从名字节点返回的NamespaceInfo，携带了存储系统标识namespaceID等信息，该标识最终放在VERSION文件中

* @throws IOException

void format(StorageDirectory sd, NamespaceInfo nsInfo) throws IOException {

sd.clearDirectory();

//然后为VERSION文件中的属性赋值并将其持久化到磁盘

this.layoutVersion = FSConstants.LAYOUT_VERSION;

this.namespaceID = nsInfo.getNamespaceID();

this.cTime = 0;

// store storageID as it currently is

sd.write();

}
--------------------------------------------------

/**

* 升级

* Move current storage into a backup directory,

* and hardlink all its blocks into the new current directory.

* @param sd storage directory

* @throws IOException

void doUpgrade(StorageDirectory sd,

NamespaceInfo nsInfo

) throws IOException {

LOG.info("Upgrading storage directory " + sd.getRoot()

+ ".\n old LV = " + this.getLayoutVersion()

+ "; old CTime = " + this.getCTime()

+ ".\n new LV = " + nsInfo.getLayoutVersion()

+ "; new CTime = " + nsInfo.getCTime());

//获得当前版本目录

File curDir = sd.getCurrentDir();

//获得上一个版本目录

File prevDir = sd.getPreviousDir();

//当前目录，必须存在，否则无法升级

assert curDir.exists() : "Current directory must exist.";

/**

* 如果prevDir目录存在，则需要删除该目录，注意，该删除操作相当于提交了上一次升级，

* 同时保证了HDFS最多保留前一版本数据的要求（删除上一个版本）

if (prevDir.exists())

deleteDir(prevDir);

//获得上一个版本的临时目录

File tmpDir = sd.getPreviousTmp();

//保存 tmpDir 不存在，tmpDir目录只能在升级过程中出现

assert !tmpDir.exists() : "previous.tmp directory must not exist.";

//目录改名： current ———> tmp 首先将"current"目录改变成"previous.tmp

rename(curDir, tmpDir);

/**

* 支持升级回滚，就必须保留升级前的数据，在数据节点，就是保存数据块以及数据块的校验信息文件，

* 在doUpgrade（）中，保留升级前数据是通过建立文件硬链接实现的

/***

* 硬链接：是一种特殊的文件系统机制，它允许一个文件可以有多个名称，

* 当一个文件有多个名称时，删除其中的一个名称，并不会删除文件数据，

* 只有所有的文件名都被删除后，文件系统才会真正删除文件数据

linkBlocks(tmpDir, curDir, this.getLayoutVersion());

// 写新版本的version文件

this.layoutVersion = FSConstants.LAYOUT_VERSION;

assert this.namespaceID == nsInfo.getNamespaceID() :

"Data-node and name-node layout versions must be the same.";

this.cTime = nsInfo.getCTime();

sd.write();

// 最后将 "previous.tmp" 改名为 "previous"完成升级

rename(tmpDir, prevDir);

/**

* 最后数据节点的存储空间会有"previous"和"current"两个目录,

* 而且，"previous"和 "current"包含了同样的数据块和数据块校验信息文件，

* 但它们有各自的version文件，升级过程中需要的"privious.tmp"目录已经消失

LOG.info("Upgrade of " + sd.getRoot()+ " is complete.");

}

天然呆的技术博客

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
DataNode内幕（一）：数据存储和它的小伙伴们

大家好，今天我们开始学习DataNode，一个相对于分布式客户端，较为复杂的一个hadoop集群的VIP会员DataNode：数据节点，在分布式集群中和真正和数据打交道的一个对象，它的职责就是存储数据和读取数据，维护它所管理的数据节点：简单的说，一个网络拓扑节点，在一个集群内，我们将一台计算机或者一台虚拟机称之为一个节点，同时一台机器（计算机和虚拟机的统称）可以属于多个集群，对于交叉集群部
复制链接

扫一扫

专栏目录