租约

理解
  • 租约时间的权衡:短租约服务器维护的客户端信息少,但续约频繁开销大。
  • 本质:租约就是在一定期限内给予持有者特定权力的协议。特性是期限。
    • 如果协议内容是服务器确认客户端还存活,那么这个租约的功能就相当于心跳;
    • 如果协议内容是服务器保证内容不会被修改,那么这个租约就相当于读锁;
    • 如果协议内容是服务器保证内容只能被这个客户端修改,那么这个租约就相当于写锁。
  • Lease说白了就是一个有时间约束的锁。客户端写文件时需要先申请一个Lease,持有该租约的客户端才可以对相应的文件进行块的添加
与租约相关的类
Server端
  • LeaseManager – 管理写文件相关的租约
  • LeaseManager.Monitor – 监控租约是否过期(主要检查hardLimit)
  • LeaseManager.Lease – 租约实体类,管理某个客户端持有的所以写锁
Client端
  • LeaseRenewer – 客户端续约更新类

下面先简单介绍下各类的内部结构,基于hadoop3.2版本。

Lease

可以看到只有3个字段。持有者即客户端更新时间文件列表

一个客户端对应一个租约,一个客户端可以同时写很多个文件,这些文件放在files中,租约维护着这些文件的写权限,并对这些文件统一续约,并不是对某个文件单独续约,不需要对某个文件进行操作之后直接从files中移除,如果files为null,则回收此租约。

class Lease {
    // 租约持有者(持有租约的客户端名字)
    private final String holder;
    // 租约更新的时间
    private long lastUpdate;
    // 该租约中包含的文件(包含持有该租约的客户端所打开的所有文件)
    private final HashSet<Long> files = new HashSet<>();
  
    /** Only LeaseManager object can create a lease */
    private Lease(String holder) {
      this.holder = holder;
      renew();
    }
    /** Only LeaseManager object can renew a lease */
    private void renew() {
      this.lastUpdate = monotonicNow();
    }

    /** @return true if the Hard Limit Timer has expired */
    public boolean expiredHardLimit() {
      return monotonicNow() - lastUpdate > hardLimit;
    }

    /** @return true if the Soft Limit Timer has expired */
    public boolean expiredSoftLimit() {
      return monotonicNow() - lastUpdate > softLimit;
    }

    /** Does this lease contain any path? */
    boolean hasFiles() {return !files.isEmpty();}

    boolean removeFile(long inodeId) {
      return files.remove(inodeId);
    }

    @Override
    public String toString() {
      return "[Lease.  Holder: " + holder
          + ", pending creates: " + files.size() + "]";
    }

    @Override
    public int hashCode() {
      return holder.hashCode();
    }
    
    private Collection<Long> getFiles() {
      return Collections.unmodifiableCollection(files);
    }

    String getHolder() {
      return holder;
    }

    @VisibleForTesting
    long getLastUpdate() {
      return lastUpdate;
    }
  }

LeaseManager

LeaseManager是租约管理类,其内部主要维护了3个集合列表(leasessortedLeasesleasesById)和两个变量(softLimit和hardLimit)。

在softLimit期限内,该客户端拥有对这个文件的独立访问权,其他客户端不能剥夺该客户端独占写这个文件的权利。
softLimit过期后,任何一个客户端都可以回收lease,继而得到这个文件的lease,获得对这个文件的独占访问权。
hardLimit过期后,namenode强制关闭文件,撤销lease。

sortedLeases中存放这从nn发出的所有租约,其中Lease按照时间顺序排序,Monitor检查hardLimit时,从sortedLeases中按照顺序拿出Lease检查就可以了。

// 软限制就是写文件时规定的租约超时时间,默认是60s
private long softLimit = HdfsConstants.LEASE_SOFTLIMIT_PERIOD;
// 硬限制则是考虑到文件close时未来得及释放lease的情况强制回收租约,默认是1h
private long hardLimit = HdfsConstants.LEASE_HARDLIMIT_PERIOD;

// 租约持有者和租约的映射
// Mapping: leaseHolder -> Lease
private final SortedMap<String, Lease> leases;

// Set of: Lease
// 存储nn所发放的所有租约
private final NavigableSet<Lease> sortedLeases;

// INodeID -> Lease
// INode和租约的映射
private final TreeMap<Long, Lease> leasesById;
Monitor

Monitor是一个Runnable类,主要用来检测Lease是否超过了hardLimit期限。在run中调用LeaseManager.checkLeases方法进行检测。其周期性是(2s)

class Monitor implements Runnable {
    final String name = getClass().getSimpleName();

    /** Check leases periodically. */
    @Override
    public void run() {
      for(; shouldRunMonitor && fsnamesystem.isRunning(); ) {
        boolean needSync = false;
        try {
          fsnamesystem.writeLockInterruptibly();
          try {
            if (!fsnamesystem.isInSafeMode()) {
              needSync = checkLeases();
            }
          } finally {
            fsnamesystem.writeUnlock("leaseManager");
            // lease reassignments should to be sync'ed.
            if (needSync) {
              fsnamesystem.getEditLog().logSync();
            }
          }
  		  // 2s
          Thread.sleep(fsnamesystem.getLeaseRecheckIntervalMs());
        } catch(InterruptedException ie) {
          LOG.debug("{} is interrupted", name, ie);
        } catch(Throwable e) {
          LOG.warn("Unexpected throwable: ", e);
        }
      }
    }
  }
LeaseRenewer

见下租约更新部分分析,暂略。

LeaseRenewer是client端更新自己租约。其中有个线程检测租约的softLimit期限,其周期性(1s)的调用LeaseRenewer.run()方法对租约过半的lease进行续约。

  • 服务端:Monitor硬检查
  • 客户端:LeaseRenewer软检查

写锁流程

HDFS租约解析.html

FSNamesystem.startFileInternal()FSNamesystem.appendFileInternal()都会调用LeaseManager.addLease()为客户端添加租约。

  /**
   * Adds (or re-adds) the lease for the specified file.
   */
  synchronized Lease addLease(String holder, long inodeId) {
    Lease lease = getLease(holder);
    if (lease == null) {
      // 构造lease对象
      lease = new Lease(holder);
      // 在LeaseManager.leases字段中添加lease对象
      leases.put(holder, lease);
      // 在LeaseManager.sortedLeases字段中添加lease对象
      sortedLeases.add(lease);
    } else {
      renewLease(lease);
    }
    // 在LeaseManager.leasesById字段中添加lease对象
    leasesById.put(inodeId, lease);
    lease.files.add(inodeId);
    return lease;
  }

在nn端一个Lease对应一个DFSClient,Lease是由holder标识的,holder的值就是DFSClient.clientName,clientName在DFSClient的构造函数中初始化,代码如下:

taskId = conf.get("mapreduce.task.attempt.id", "NONMAPREDUCE");
this.clientName = "DFSClient_" + dfsClientConf.taskId + "_" + 
        DFSUtil.getRandom().nextInt()  + "_" + Thread.currentThread().getId();

clientName是由taskId随机数currentThread.Id拼起来的,所以每次写请求的clientName是不一样的,则Lease也是不一样的。

addLease的逻辑是先从LeaseManager.leases(holder和lease映射)中查找是否存在holder对应的lease,不存在则由LeaseManager创建一个lease,存在则更新lease

new出lease后,将其放入LeaseManager中的三个集合中,并把此租约对应的path放入lease的files中。 添加完成。


租约更新

当客户端打开一个文件用于写或者追加写操作时,LeaseManager会保护这个客户端在该文件上的租约。客户端会启动一个LeaseRenewer定期更新租约,以防租约过期。

注意:租约续约是由客户端发起的。

客户端在dfs.create()中调用beginFileLease()对租约进行续约。

  /** Get a lease and start automatic renewal */
  private void beginFileLease(final long inodeId, final DFSOutputStream out)
      throws IOException {
    getLeaseRenewer().put(inodeId, out, this);
  }

客户端续约是通过LeaseRenewer来实现的,LeaseRenewer是由存放namenode信息的authority和user信息的ugi来实例化的。

// DFSClient.class
public LeaseRenewer getLeaseRenewer() throws IOException {
    return LeaseRenewer.getInstance(authority, ugi, this);
}
// LeaseRenewer.class
static LeaseRenewer getInstance(final String authority,
    final UserGroupInformation ugi, final DFSClient dfsc) throws IOException {
  final LeaseRenewer r = Factory.INSTANCE.get(authority, ugi);
  r.addClient(dfsc);
  return r;
}
// LeassRenewer.Factory.class
private synchronized LeaseRenewer get(final String authority,
    final UserGroupInformation ugi) {
  final Key k = new Key(authority, ugi);
  LeaseRenewer r = renewers.get(k);
  if (r == null) {
    r = new LeaseRenewer(k);
    renewers.put(k, r);
  }
  return r;
}

LeaseRenewer的实例化是通过Factory实例化的,Factory先去renewers中查找是否有当前user的LeaseRenewer,没有则new一个,有则直接返回已有的LeaseRenewer,然后在getInstance中,将DFSClient的实例dfsc放入LeaseRenewer的dfsclients的list中。user对应的LeaseRenewer对象初始化完毕。

然后调用put方法将文件标识Id、对应的文件流和DFSClient实例传入LeaseRenewer中:

synchronized void put(final long inodeId, final DFSOutputStream out,
    final DFSClient dfsc) {
  if (dfsc.isClientRunning()) {
    // 判断daemon是否在运行,
    // 或者检查dfsclients为空之后的时间是否超过了gracePeriod
    // 如果daemon没有运行或者为空的时间超过了gracePeriod则新new一个守护线程
    if (!isRunning() || isRenewerExpired()) {
      //start a new deamon with a new id.
      final int id = ++currentId;
      daemon = new Daemon(new Runnable() {
        @Override
        public void run() {
          try {
            if (LOG.isDebugEnabled()) {
              LOG.debug("Lease renewer daemon for " + clientsString()
                  + " with renew id " + id + " started");
            }
            LeaseRenewer.this.run(id);
          } catch(InterruptedException e) {
            if (LOG.isDebugEnabled()) {
              LOG.debug(LeaseRenewer.this.getClass().getSimpleName()
                  + " is interrupted.", e);
            }
          } finally {
            synchronized(LeaseRenewer.this) {
              Factory.INSTANCE.remove(LeaseRenewer.this);
            }
            if (LOG.isDebugEnabled()) {
              LOG.debug("Lease renewer daemon for " + clientsString()
                  + " with renew id " + id + " exited");
            }
          }
        }
        
        @Override
        public String toString() {
          return String.valueOf(LeaseRenewer.this);
        }
      });
      daemon.start();
    }
    dfsc.putFileBeingWritten(inodeId, out);
    emptyTime = Long.MAX_VALUE;
  }
}

在put中有个守护线程,在守护线程中调用LeaseRenewer.run方法对租约进行check然后renew,这里check的是softLimit。守护线程只有在daemon为null或者dfsclients为空的时间超过了gracePeriod时才需要重新new一个daemon线程。
LeaseRenewer.this.run(id);调用外层的run。

private void run(final int id) throws InterruptedException {
  for(long lastRenewed = Time.now(); !Thread.interrupted();
      Thread.sleep(getSleepPeriod())) {
    final long elapsed = Time.now() - lastRenewed;
    // 判断是否超过了softLimit的一半
    if (elapsed >= getRenewalTime()) {
      try {
      	// 续约
        renew();
        ...
        // 更新续约时间
        lastRenewed = Time.now();
      } catch (SocketTimeoutException ie) {
        ...
        break;
      } catch (IOException ie) {
        ...
      }
    }
    ...
  }
}

run中调用renew()进行续约,这里续约是对当前user的所有DFSClient(也就是当前user的所有Lease)进行续约。

private void renew() throws IOException {
  final List<DFSClient> copies;
  synchronized(this) {
    copies = new ArrayList<DFSClient>(dfsclients);
  }
  //sort the client names for finding out repeated names.
  Collections.sort(copies, new Comparator<DFSClient>() {
    @Override
    public int compare(final DFSClient left, final DFSClient right) {
      return left.getClientName().compareTo(right.getClientName());
    }
  });
  String previousName = "";
  for(int i = 0; i < copies.size(); i++) {
    final DFSClient c = copies.get(i);
    //skip if current client name is the same as the previous name.
    if (!c.getClientName().equals(previousName)) {
      // 续约	
      if (!c.renewLease()) {
        if (LOG.isDebugEnabled()) {
          LOG.debug("Did not renew lease for client " +
              c);
        }
        continue;
      }
      previousName = c.getClientName();
      ...
    }
  }
}

在renew中,先对dfsclients中的DFSClient进行排序,主要是为了将重复发clientName放在一起,renew时只对其中一个clientName进行更新,调用c.renewLease进行续约

boolean renewLease() throws IOException {
  if (clientRunning && !isFilesBeingWrittenEmpty()) {
    try {
      // rpc调用LeaseManager.renewLease
      namenode.renewLease(clientName);
      updateLastLeaseRenewal();
      return true;
    } catch (IOException e) {
      // Abort if the lease has already expired. 
      final long elapsed = Time.now() - getLastLeaseRenewal();
      if (elapsed > HdfsConstants.LEASE_HARDLIMIT_PERIOD) {
        LOG.warn("Failed to renew lease for " + clientName + " for "
            + (elapsed/1000) + " seconds (>= hard-limit ="
            + (HdfsConstants.LEASE_HARDLIMIT_PERIOD/1000) + " seconds.) "
            + "Closing all files being written ...", e);
        closeAllFilesBeingWritten(true);
      } else {
        // Let the lease renewer handle it and retry.
        throw e;
      }
    }
  }
  return false;
}

在renewLease中远程调用LeaseManager.renewLease,其调用流程为NameNodeRpcServer.renewLease --> FSNamesystem.renewLease --> LeaseManager.renewLease(holder),

// LeaseManager.class
synchronized void renewLease(String holder) {
  renewLease(getLease(holder));
}
synchronized void renewLease(Lease lease) {
  if (lease != null) {
    sortedLeases.remove(lease);
    lease.renew();
    sortedLeases.add(lease);
  }
}
// LeaseManager.Lease.class
private void renew() {
  this.lastUpdate = now();
}

客户端通过LeaseRenewer调用LeaseManager.renewLease进行续约,续约逻辑是先从leases中get到clientName对应的lease,然后从sortedLeases中移除该lease,调用lease.renew对lease的lastUpdate进行更新,最后将lease再放入sortedLeases中。sortedLeases中的lease是按照lease的lastUpdate进行排序的,到此客户端续约的流程结束。

租约恢复

客户端发生故障,不能完成租约更新,则进行租约恢复。

  • 写文件期间,租约限制60s(不可配置)
  • 故障时,租约限制60min(不可配置),进行删除过期租约。
/**
   * States, which a block can go through while it is under construction.
   */
  static public enum BlockUCState {
    /**
     * Block construction completed.<br>
     * The block has at least the configured minimal replication number
     * of {@link ReplicaState#FINALIZED} replica(s), and is not going to be
     * modified.
     * NOTE, in some special cases, a block may be forced to COMPLETE state,
     * even if it doesn't have required minimal replications.
     * 在某些特殊情况下,一个块可能被强制为COMPLETE状态,
     */
    COMPLETE,
    /**
     * The block is under construction.<br>
     * It has been recently allocated for write or append.
     */
    UNDER_CONSTRUCTION,
    /**
     * The block is under recovery.<br>
     * When a file lease expires its last block may not be {@link #COMPLETE}
     * and needs to go through a recovery procedure, 
     * which synchronizes the existing replicas contents.
     * 当文件租约到期时,其最后一块可能没有完成
     * 并且需要执行恢复程序,
      *同步现有副本内容。
     */
    UNDER_RECOVERY,
    /**
     * The block is committed.<br>
     * The client reported that all bytes are written to data-nodes
     * with the given generation stamp and block length, but no 
     * {@link ReplicaState#FINALIZED} 
     * replicas has yet been reported by data-nodes themselves.
     */
    COMMITTED;
  }
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值