前言
HDFS文件是write-once-read-many,并且不支持客户端的并行写操作,那么hdfs是使用什么机制来保证对文件的互斥操作的呢?租约-----namenode给予租约持有者(LeaseHolder)在规定时间内拥有文件权限(写文件)的许可凭证。HDFS提供Lease机制来保证写入数据的一致性。下面主要介绍HDFS租约机制的实现原理。
HDFS租约介绍
*每个客户端用户持有一个租约。
*每个租约内部包含有一个租约持有者信息,还有此租约对应的文件Id列表,表示当前租约持有者正在写这些文件Id对应的文件。
*每个租约内包含有一个最新近更新时间,最近更新时间将会决定此租约是否已过期。过期的租约会导致租约持有者无法继续执行写数据到文件中,除非进行租约的更新。
看下lease的内部结构:
private final String holder; // 持有者
private long lastUpdate; // 更新时间
private final Collection<String> paths = new TreeSet<String>(); // 租约持有者打开的文件
lease中三个重要的方法:
/** Only LeaseManager object can renew a lease */
private void renew() { // 用于更新客户端lastUpdate最近更新时间
this.lastUpdate = monotonicNow();
}
/** @return true if the Hard Limit Timer has expired */
public boolean expiredHardLimit() { // 用于判断当前租约是否超过了软限制(softlimit),软 限制是写文件规定的超时时间,默认是60s
return monotonicNow() - lastUpdate > hardLimit;
}
/** @return true if the Soft Limit Timer has expired */
public boolean expiredSoftLimit() { // 用于判断当前租约是否超过了硬限制(hardlimit),硬限制是用于考虑文件关闭异常时,强制回收租约的时间,默认时60min。LeaseManager中有一个内部类用于定期检测租约的更新情况,当超过硬限制时,会触发租约恢复机制。
return monotonicNow() - lastUpdate > softLimit;
}
HDFS租约管理实现
LeaseManager(租约管理类):
LeaseManager是namenode中维护所有租约的类,保存了hdfs中所有的租约信息,提供租约的增删改查方法,同时维护一个Monitor线程定期检查租约是否超时,对于长时间没有更新租约的文件(超过硬限制时间),LeaseManager会触发租约恢复机制,然后关闭文件。
首先看下租约中维护的字段:
// 租约持有者与租约的对应关系
private final SortedMap<String, Lease> leases = new TreeMap<String, Lease>();
// Set of: Lease
// 以租约更新时间为顺序保存LeaseManager中所有的租约,如果更新时间相同,按照租约持有者的字典序排序
private final NavigableSet<Lease> sortedLeases = new TreeSet<Lease>();
//
// Map path names to leases. It is protected by the sortedLeases lock.
// The map stores pathnames in lexicographical order.
// 保存文件路径和租约的对应关系,以路径的字典序为顺序保存
private final SortedMap<String, Lease> sortedLeasesByPath = new TreeMap<String, Lease>();
LeaseManager保存多种映射关系是为了方便租约的多维度查询,至少目前来看,按照租约持有者,正在写的文件Id都可以直接查到对应的租约对象。
看下几个重点方法:
// 新增租约
synchronized Lease addLease(String holder, String src) ....
// 删除租约
synchronized void removeLease(Lease lease, String src) ....
// 删除指定持有者和文件的租约
synchronized void removeLease(String holder, String src) ....
// 租约更新
synchronized void renewLease(String holder) ....
说一下租约更新的实现:
/**
* Renew the lease(s) held by the given client
*/
synchronized void renewLease(String holder) {
renewLease(getLease(holder));
}
synchronized void renewLease(Lease lease) {
if (lease != null) {
sortedLeases.remove(lease);
lease.renew();
sortedLeases.add(lease);
}
}
当客户端打开一个文件用于write或者append操作时,LeaseManager会保存这个客户端在该文件上的租约。客户端会启动一个LeaseRenewer定期更新租约,以防止租约过期。租约过期操作namenode端由FSNamesystem.renewLease()响应的,这个方法调用LeaseManager.renewLease, 首先从sortedLeases中移除这个租约,然后更新时间,重新加入到sortedLeases。
然后说下LeaseManger的Monitor的实现:
租约检查---Monitor线程
定期检查所有的租约是否过期,如果租约过期(超过硬超时)会进行租约恢复操作,并关闭文件。
/******************************************************
* Monitor checks for leases that have expired,
* and disposes of them.
******************************************************/
class Monitor implements Runnable {
final String name = getClass().getSimpleName();
/** Check leases periodically. */
@Override
public void run() {
for(; shouldRunMonitor && fsnamesystem.isRunning(); ) {
boolean needSync = false;
try {
fsnamesystem.writeLockInterruptibly();
try {
if (!fsnamesystem.isInSafeMode()) {
needSync = checkLeases();
}
} finally {
fsnamesystem.writeUnlock();
// lease reassignments should to be sync'ed.
if (needSync) {
fsnamesystem.getEditLog().logSync();
}
}
Thread.sleep(HdfsServerConstants.NAMENODE_LEASE_RECHECK_INTERVAL);
} catch(InterruptedException ie) {
if (LOG.isDebugEnabled()) {
LOG.debug(name + " is interrupted", ie);
}
}
}
}
}
/** Check the leases beginning from the oldest.
* @return true is sync is needed.
*/
@VisibleForTesting
synchronized boolean checkLeases() {
boolean needSync = false;
assert fsnamesystem.hasWriteLock();
Lease leaseToCheck = null;
try {
leaseToCheck = sortedLeases.first();
} catch(NoSuchElementException e) {}
while(leaseToCheck != null) {
if (!leaseToCheck.expiredHardLimit()) {
break;
}
LOG.info(leaseToCheck + " has expired hard limit");
final List<String> removing = new ArrayList<String>();
// need to create a copy of the oldest lease paths, because
// internalReleaseLease() removes paths corresponding to empty files,
// i.e. it needs to modify the collection being iterated over
// causing ConcurrentModificationException
String[] leasePaths = new String[leaseToCheck.getPaths().size()];
leaseToCheck.getPaths().toArray(leasePaths);
for(String p : leasePaths) {
try {
INodesInPath iip = fsnamesystem.getFSDirectory().getINodesInPath(p,
true);
boolean completed = fsnamesystem.internalReleaseLease(leaseToCheck, p,
iip, HdfsServerConstants.NAMENODE_LEASE_HOLDER);
if (LOG.isDebugEnabled()) {
if (completed) {
LOG.debug("Lease recovery for " + p + " is complete. File closed.");
} else {
LOG.debug("Started block recovery " + p + " lease " + leaseToCheck);
}
}
// If a lease recovery happened, we need to sync later.
if (!needSync && !completed) {
needSync = true;
}
} catch (IOException e) {
LOG.error("Cannot release the path " + p + " in the lease "
+ leaseToCheck, e);
removing.add(p);
}
}
for(String p : removing) {
removeLease(leaseToCheck, p);
}
leaseToCheck = sortedLeases.higher(leaseToCheck);
}
try {
if(leaseToCheck != sortedLeases.first()) {
LOG.warn("Unable to release hard-limit expired lease: "
+ sortedLeases.first());
}
} catch(NoSuchElementException e) {}
return needSync;
}
从以上代码中看出,monitor线程定期调用checkLease方法,遍历sortedLeases中所有lease,从first : oldest lease 开始处理,如果超过hardlimit 调用fsnamesystem.internalReleaseLease进行恢复操作。
LeaseRenewer租约更新器
LeaseRenewer对象的作用在于定时更新DFSClient用户所持有的租约。每个用户对应一个LeaseRenewer更新器对象,而每个LeaseRenewer对象内部会维护一个DFSClient客户端列表。在LeaseRenewer的主方法中,会定期的执行DFSClient客户端对应租约的renew操作。当DFSClient端所操作的文件都被关闭了,此DFSClient将从LeaseRenewer的客户端列表中进行移除,这就意味着此DFSClient所对应的租约将不再被更新,最后将会被LeaseManager进行过期移除操作。