申明:本文基于hadoop2.7 进行源码研读
一、DataNode类代码注释
我简单对类注释做了一些翻译:
/**********************************************************
* DataNode is a class (and program) that stores a set of
* blocks for a DFS deployment. A single deployment can
* have one or many DataNodes. Each DataNode communicates
* regularly with a single NameNode. It also communicates
* with client code and other DataNodes from time to time.
*
* 翻译一:
* DataNode是一个类(也是一个程序),DataNode在hdfs上存储了很多block块。一个分布式文件系统中可以有一个或多个DataNode。
* 每个DataNode通周期性和NameNode通信,客户端也可以跟DataNode进行通信,或者DataNode之间也可以进行通信
*
* DataNodes store a series of named blocks. The DataNode
* allows client code to read these blocks, or to write new
* block data. The DataNode may also, in response to instructions
* from its NameNode, delete blocks or copy blocks to/from other
* DataNodes.
*
* 翻译二:
* DataNodes存储了一系列的blocks, DataNode允许客户端读写block。
* DataNode也会响应NameNode的指令,比如delete blocks or copy blocks到别的DataNode
*
* The DataNode maintains just one critical table:
* block-> stream of bytes (of BLOCK_SIZE or less)
*
* 翻译三:
* DataNode管理了一些重要的表
* block-> stream of bytes 也包括一些元数据信息
*
* This info is stored on a local disk. The DataNode
* reports the table's contents to the NameNode upon startup
* and every so often afterwards.
*
* 翻译四:
* 这些信息存在于本地磁盘,DataNode启动时候会把这些信息上报给NameNode,启动之后也会不断上报
*
* DataNodes spend their lives in an endless loop of asking
* the NameNode for something to do. A NameNode cannot connect
* to a DataNode directly; a NameNode simply returns values from
* functions invoked by a DataNode.
*
* 翻译五:
* DataNodes 会不断发送消息给NameNode(心跳),请求NameNode自己需要做什么事情
* NameNode收到心跳后,会返回指令给DataNode,DataNode收到指令并在本地执行
* 所以NameNode并不能直接调用DataNode,而是通过心跳返回值中发送指定给DataNode,让DataNode自己执行指令
*
* DataNodes maintain an open server socket so that client code
* or other DataNodes can read/write data. The host/port for
* this server is reported to the NameNode, which then sends that
* information to clients or other DataNodes that might be interested.
*
* 翻译六:
* DataNode开放socket服务,让客户端或其他DataNode可以读写数据
* DataNode启动的时候会把host/port上报给NameNode
* Client或者其他DataNode想要访问某个DataNode,先要跟NameNode通信,获取目标DataNode的host/port
*
**********************************************************/
二、从DataNode的main方法开始
2.1 创建一个DataNode
public static void main(String args[]) {
if (DFSUtil.parseHelpArgument(args, DataNode.USAGE, System.out, true)) {
System.exit(0);
}
// TODO 2.1 核心启动函数
secureMain(args, null);
}
//2.1.1 核心启动函数
public static void secureMain(String args[], SecureResources resources) {
int errorCode = 0;
try {
StringUtils.startupShutdownMessage(DataNode.class, args, LOG);
//TODO 2.1.2 创建一个DataNode实例
DataNode datanode = createDataNode(args, null, resources);
if (datanode != null) {
datanode.join();
} else {
errorCode = 1;
}
} catch (Throwable e) {
LOG.fatal("Exception in secureMain", e);
terminate(1, e);
} finally {
// We need to terminate the process here because either shutdown was called
// or some disk related conditions like volumes tolerated or volumes required
// condition was not met. Also, In secure mode, control will go to Jsvc
// and Datanode process hangs if it does not exit.
LOG.warn("Exiting Datanode");
terminate(errorCode);
}
}
// 2.1.2 创建一个DataNode实例
public static DataNode instantiateDataNode(String args [], Configuration conf,
SecureResources resources) throws IOException {
if (conf == null)
conf = new HdfsConfiguration();
if (args != null) {
// parse generic hadoop options
GenericOptionsParser hParser = new GenericOptionsParser(conf, args);
args = hParser.getRemainingArgs();
}
if (!parseArguments(args, conf)) {
printUsage(System.err);
return null;
}
Collection<StorageLocation> dataLocations = getStorageLocations(conf);
UserGroupInformation.setConfiguration(conf);
SecurityUtil.login(conf, DFS_DATANODE_KEYTAB_FILE_KEY,
DFS_DATANODE_KERBEROS_PRINCIPAL_KEY);
//TODO 2.1.3 创建一个DataNode实例
return makeInstance(dataLocations, conf, resources);
}
// 2.1.3 创建一个DataNode实例
static DataNode makeInstance(Collection<StorageLocation> dataDirs,
Configuration conf, SecureResources resources) throws IOException {
LocalFileSystem localFS = FileSystem.getLocal(conf);
FsPermission permission = new FsPermission(
conf.get(DFS_DATANODE_DATA_DIR_PERMISSION_KEY,
DFS_DATANODE_DATA_DIR_PERMISSION_DEFAULT));
DataNodeDiskChecker dataNodeDiskChecker =
new DataNodeDiskChecker(permission);
List<StorageLocation> locations =
checkStorageLocations(dataDirs, localFS, dataNodeDiskChecker);
DefaultMetricsSystem.initialize("DataNode");
assert locations.size() > 0 : "number of data directories should be > 0";
//TODO 2.2 查看DataNode的构造方法
return new DataNode(conf, locations, resources);
}
2.2 启动DataNode
2.2.1 DataNode构造方法
DataNode(final Configuration conf,
final List<StorageLocation> dataDirs,
final SecureResources resources) throws IOException {
// ...
try {
hostName = getHostName(conf);
LOG.info("Configured hostname is " + hostName);
//2.2.1 启动DataNode
startDataNode(conf, dataDirs, resources);
} catch (IOException ie) {
shutdown();
throw ie;
}
final int dncCacheMaxSize =
conf.getInt(DFS_DATANODE_NETWORK_COUNTS_CACHE_MAX_SIZE_KEY,
DFS_DATANODE_NETWORK_COUNTS_CACHE_MAX_SIZE_DEFAULT) ;
datanodeNetworkCounts =
CacheBuilder.newBuilder()
.maximumSize(dncCacheMaxSize)
.build(new CacheLoader<String, Map<String, Long>>() {
@Override
public Map<String, Long> load(String key) throws Exception {
final Map<String, Long> ret = new HashMap<String, Long>();
ret.put("networkErrors", 0L);
return ret;
}
});
}
//2.2.1 启动DataNode
void startDataNode(Configuration conf,
List<StorageLocation> dataDirs,
SecureResources resources
) throws IOException {
// ...
// TODO 创建数据存储对象 DataStorage
storage = new DataStorage();
// global DN settings
registerMXBean();
// TODO 2.2.2 初始化DataXceiverServer
initDataXceiver(conf);
// TODO 2.2.3 启动HttpServer服务
startInfoServer(conf);
pauseMonitor = new JvmPauseMonitor(conf);
pauseMonitor.start();
// BlockPoolTokenSecretManager is required to create ipc server.
this.blockPoolTokenSecretManager = new BlockPoolTokenSecretManager();
// Login is done by now. Set the DN user name.
dnUserName = UserGroupInformation.getCurrentUser().getShortUserName();
LOG.info("dnUserName = " + dnUserName);
LOG.info("supergroup = " + supergroup);
//TODO 2.2.4 初始化RPC服务
initIpcServer(conf);
metrics = DataNodeMetrics.create(conf, getDisplayName());
metrics.getJvmMetrics().setPauseMonitor(pauseMonitor);
/**
* TODO 创建一个BlockPoolManager
* BlockPool是块池,正常情况下一个集群就有一个BlockPool
* 如果使用联邦机制,就会存在多个NameNode,一个联邦对应一个BlockPool
* 假设一个集群中存在4个NameNode,2个联邦
* 联邦一:namenode1(Active) namenode1(Stand by) --> namenode1和namenode2是同一个blockpool
* 联邦二:namenode3(Active) namenode4(Stand by) --> namenode3和namenode4是同一个blockpool
*/
blockPoolManager = new BlockPoolManager(this);
//TODO 2.2.5 初始化BlockPoolManager
// (1) 向NameNode进行注册 (2) 向NameNode发起心跳
blockPoolManager.refreshNamenodes(conf);
// Create the ReadaheadPool from the DataNode context so we can
// exit without having to explicitly shutdown its thread pool.
readaheadPool = ReadaheadPool.getInstance();
saslClient = new SaslDataTransferClient(dnConf.conf,
dnConf.saslPropsResolver, dnConf.trustedChannelResolver);
saslServer = new SaslDataTransferServer(dnConf, blockPoolTokenSecretManager);
}
2.2.2 初始化DataXceiverServer
// 2.2.2 初始化DataXceiverServer
private void initDataXceiver(Configuration conf) throws IOException {
// find free port or use privileged port provided
TcpPeerServer tcpPeerServer;
if (secureResources != null) {
tcpPeerServer = new TcpPeerServer(secureResources);
} else {
tcpPeerServer = new TcpPeerServer(dnConf.socketWriteTimeout,
DataNode.getStreamingAddr(conf));
}
tcpPeerServer.setReceiveBufferSize(HdfsConstants.DEFAULT_DATA_SOCKET_SIZE);
streamingAddr = tcpPeerServer.getStreamingAddr();
LOG.info("Opened streaming server at " + streamingAddr);
this.threadGroup = new ThreadGroup("dataXceiverServer");
//TODO 实例化了一个DataXceiverServer, 负责接收客户端或者其他DataNode传输过来的数据服务
xserver = new DataXceiverServer(tcpPeerServer, conf, this);
//设置为后台线程
this.dataXceiverServer = new Daemon(threadGroup, xserver);
this.threadGroup.setDaemon(true); // auto destroy when empty
if (conf.getBoolean(DFSConfigKeys.DFS_CLIENT_READ_SHORTCIRCUIT_KEY,
DFSConfigKeys.DFS_CLIENT_READ_SHORTCIRCUIT_DEFAULT) ||
conf.getBoolean(DFSConfigKeys.DFS_CLIENT_DOMAIN_SOCKET_DATA_TRAFFIC,
DFSConfigKeys.DFS_CLIENT_DOMAIN_SOCKET_DATA_TRAFFIC_DEFAULT)) {
DomainPeerServer domainPeerServer =
getDomainPeerServer(conf, streamingAddr.getPort());
if (domainPeerServer != null) {
this.localDataXceiverServer = new Daemon(threadGroup,
new DataXceiverServer(domainPeerServer, conf, this));
LOG.info("Listening on UNIX domain socket: " +
domainPeerServer.getBindPath());
}
}
this.shortCircuitRegistry = new ShortCircuitRegistry(conf);
}
2.2.3 启动HttpServer服务
// 2.2.3 启动HttpServer服务
private void startInfoServer(Configuration conf)
throws IOException {
Configuration confForInfoServer = new Configuration(conf);
confForInfoServer.setInt(HttpServer2.HTTP_MAX_THREADS, 10);
HttpServer2.Builder builder = new HttpServer2.Builder()
.setName("datanode")
.setConf(conf).setACL(new AccessControlList(conf.get(DFS_ADMIN, " ")))
.addEndpoint(URI.create("http://localhost:0"))
.setFindPort(true);
// 设计模式之: 建造模式 构建出了一个HttpServer对象
this.infoServer = builder.build();
// httpServer上绑定多个servlet
this.infoServer.addInternalServlet(null, "/streamFile/*", StreamFile.class);
this.infoServer.addInternalServlet(null, "/getFileChecksum/*",
FileChecksumServlets.GetServlet.class);
this.infoServer.setAttribute("datanode", this);
this.infoServer.setAttribute(JspHelper.CURRENT_CONF, conf);
this.infoServer.addServlet(null, "/blockScannerReport",
BlockScanner.Servlet.class);
// http服务启动
this.infoServer.start();
InetSocketAddress jettyAddr = infoServer.getConnectorAddress(0);
// SecureDataNodeStarter will bind the privileged port to the channel if
// the DN is started by JSVC, pass it along.
ServerSocketChannel httpServerChannel = secureResources != null ?
secureResources.getHttpServerChannel() : null;
this.httpServer = new DatanodeHttpServer(conf, jettyAddr, httpServerChannel);
httpServer.start();
if (httpServer.getHttpAddress() != null) {
infoPort = httpServer.getHttpAddress().getPort();
}
if (httpServer.getHttpsAddress() != null) {
infoSecurePort = httpServer.getHttpsAddress().getPort();
}
}
2.2.4 初始化RPC服务
//2.2.4 初始化RPC服务
private void initIpcServer(Configuration conf) throws IOException {
InetSocketAddress ipcAddr = NetUtils.createSocketAddr(
conf.getTrimmed(DFS_DATANODE_IPC_ADDRESS_KEY));
// Add all the RPC protocols that the Datanode implements
RPC.setProtocolEngine(conf, ClientDatanodeProtocolPB.class,
ProtobufRpcEngine.class);
ClientDatanodeProtocolServerSideTranslatorPB clientDatanodeProtocolXlator =
new ClientDatanodeProtocolServerSideTranslatorPB(this);
//处理客户端和其他DataNode的请求
BlockingService service = ClientDatanodeProtocolService
.newReflectiveBlockingService(clientDatanodeProtocolXlator);
//设计模式之: 建造模式 构建出了一个RPCServer对象
ipcServer = new RPC.Builder(conf)
.setProtocol(ClientDatanodeProtocolPB.class)
.setInstance(service)
.setBindAddress(ipcAddr.getHostName())
.setPort(ipcAddr.getPort())
.setNumHandlers(
conf.getInt(DFS_DATANODE_HANDLER_COUNT_KEY,
DFS_DATANODE_HANDLER_COUNT_DEFAULT)).setVerbose(false)
.setSecretManager(blockPoolTokenSecretManager).build();
InterDatanodeProtocolServerSideTranslatorPB interDatanodeProtocolXlator =
new InterDatanodeProtocolServerSideTranslatorPB(this);
service = InterDatanodeProtocolService
.newReflectiveBlockingService(interDatanodeProtocolXlator);
DFSUtil.addPBProtocol(conf, InterDatanodeProtocolPB.class, service,
ipcServer);
TraceAdminProtocolServerSideTranslatorPB traceAdminXlator =
new TraceAdminProtocolServerSideTranslatorPB(this);
BlockingService traceAdminService = TraceAdminService
.newReflectiveBlockingService(traceAdminXlator);
DFSUtil.addPBProtocol(conf, TraceAdminProtocolPB.class, traceAdminService,
ipcServer);
LOG.info("Opened IPC server at " + ipcServer.getListenerAddress());
// set service-level authorization security policy
if (conf.getBoolean(
CommonConfigurationKeys.HADOOP_SECURITY_AUTHORIZATION, false)) {
ipcServer.refreshServiceAcl(conf, new HDFSPolicyProvider());
}
}
RPCServer由DataNode类直接管理,DataNode实现了多个接口
public class DataNode extends ReconfigurableBase implements InterDatanodeProtocol, ClientDatanodeProtocol,
TraceAdminProtocol, DataNodeMXBean
2.2.5 初始化BlockPoolManager
BlockPoolManager最重要的功能就是维护Datanode上所有BPOfferService对象的引用,同时对外提供多种检索BPOfferService的方式----通过命名空间id(nameserviceId) 检索、通过块池id(blockPoolId) 检索等。
BlockPoolManager类的成员变量
/**
* Manages the BPOfferService objects for the data node.
* Creation, removal, starting, stopping, shutdown on BPOfferService
* objects must be done via APIs in this class.
*/
@InterfaceAudience.Private
class BlockPoolManager {
private static final Log LOG = DataNode.LOG;
//维护NameserviceId -> BPOfferService的映射
private final Map<String, BPOfferService> bpByNameserviceId =
Maps.newHashMap();
//维护BlockPoolId -> BPOfferService的映射
private final Map<String, BPOfferService> bpByBlockPoolId =
Maps.newHashMap();
private final List<BPOfferService> offerServices =
Lists.newArrayList();
private final DataNode dn;
}
BlockPoolManager中的核心方法refreshNamenodes()
void refreshNamenodes(Configuration conf)
throws IOException {
LOG.info("Refresh request received for nameservices: " + conf.get
(DFSConfigKeys.DFS_NAMESERVICES));
Map<String, Map<String, InetSocketAddress>> newAddressMap = DFSUtil
.getNNServiceRpcAddressesForCluster(conf);
synchronized (refreshNamenodesLock) {
//2.2.5.1 TODO (1) 向NameNode进行注册 (2) 向NameNode发起心跳
doRefreshNamenodes(newAddressMap);
}
}
// 2.2.5.1 (1) 向NameNode进行注册 (2) 向NameNode发起心跳
private void doRefreshNamenodes(
Map<String, Map<String, InetSocketAddress>> addrMap) throws IOException {
assert Thread.holdsLock(refreshNamenodesLock);
Set<String> toRefresh = Sets.newLinkedHashSet();
Set<String> toAdd = Sets.newLinkedHashSet();
Set<String> toRemove;
synchronized (this) {
// Step 1. For each of the new nameservices, figure out whether
// it's an update of the set of NNs for an existing NS,
// or an entirely new nameservice.
// nameservices HA架构
/**
* 如果是联邦架构, 就会存在多个nameservices
* namenode1, namenode2 -> 联邦1 -> nameservices1
* namenode3, namenode4 -> 联邦2 -> nameservices2
*/
for (String nameserviceId : addrMap.keySet()) {
if (bpByNameserviceId.containsKey(nameserviceId)) {
toRefresh.add(nameserviceId);
} else {
// toAdd 是一个 Set<String>, 存储着nameservices
toAdd.add(nameserviceId);
}
}
// Step 2. Any nameservices we currently have but are no longer present
// need to be removed.
toRemove = Sets.newHashSet(Sets.difference(
bpByNameserviceId.keySet(), addrMap.keySet()));
assert toRefresh.size() + toAdd.size() ==
addrMap.size() :
"toAdd: " + Joiner.on(",").useForNull("<default>").join(toAdd) +
" toRemove: " + Joiner.on(",").useForNull("<default>").join(toRemove) +
" toRefresh: " + Joiner.on(",").useForNull("<default>").join(toRefresh);
// Step 3. Start new nameservices
if (!toAdd.isEmpty()) {
LOG.info("Starting BPOfferServices for nameservices: " +
Joiner.on(",").useForNull("<default>").join(toAdd));
// TODO 遍历所有的联邦, 假设一个联邦有两个NameNode(HA)
// toAdd是nameservices集合
for (String nsToAdd : toAdd) {
ArrayList<InetSocketAddress> addrs =
//取出nameservices的多个namenode --> namenode1 namenode2
Lists.newArrayList(addrMap.get(nsToAdd).values());
/**
* 一个联邦对应一个BPOfferService
* 一个联邦里面的NameNode就是一个BPOfferActor
* 比如:
* namenode1, namenode2 -> 联邦1 -> nameservices1
* namenode3, namenode4 -> 联邦2 -> nameservices2
* nameservices1为一个联邦,对应一个BPOfferService,nameservices1中的两个nameNode对应两个BPOfferActor
*/
BPOfferService bpos = createBPOS(addrs);
bpByNameserviceId.put(nsToAdd, bpos);
// List<BPOfferService> offerServices,将BPOfferService存在集合中
offerServices.add(bpos);
}
}
// 2.2.5.2 DataNode向NameNode进行注册和心跳
startAll();
}
// Step 4. Shut down old nameservices. This happens outside
// of the synchronized(this) lock since they need to call
// back to .remove() from another thread
if (!toRemove.isEmpty()) {
LOG.info("Stopping BPOfferServices for nameservices: " +
Joiner.on(",").useForNull("<default>").join(toRemove));
for (String nsToRemove : toRemove) {
BPOfferService bpos = bpByNameserviceId.get(nsToRemove);
bpos.stop();
bpos.join();
// they will call remove on their own
}
}
// Step 5. Update nameservices whose NN list has changed
if (!toRefresh.isEmpty()) {
LOG.info("Refreshing list of NNs for nameservices: " +
Joiner.on(",").useForNull("<default>").join(toRefresh));
for (String nsToRefresh : toRefresh) {
BPOfferService bpos = bpByNameserviceId.get(nsToRefresh);
ArrayList<InetSocketAddress> addrs =
Lists.newArrayList(addrMap.get(nsToRefresh).values());
bpos.refreshNNList(addrs);
}
}
}
2.2.6 DataNode向NameNode进行注册和心跳
// 2.2.6 DataNode向NameNode进行注册和心跳
synchronized void startAll() throws IOException {
try {
UserGroupInformation.getLoginUser().doAs(
new PrivilegedExceptionAction<Object>() {
@Override
public Object run() throws Exception {
//TODO 遍历所有的BPOfferService 遍历所有的联邦
for (BPOfferService bpos : offerServices) {
// TODO 2.2.6.1 每个联邦启动
bpos.start();
}
return null;
}
});
} catch (InterruptedException ex) {
IOException ioe = new IOException();
ioe.initCause(ex.getCause());
throw ioe;
}
}
//2.2.6.1 每个联邦启动 (BPOfferService类)
//This must be called only by blockPoolManager
//通过BPOfferService的start方法循环启动BPServiceActor线程,以便BPServiceActor向其对应的namenode发送注册和心跳消息。
void start() {
//TODO 一个BPOfferService(联邦)存在多个BPServiceActor(NameNode)
for (BPServiceActor actor : bpServices) {
//TODO 2.2.6.2 DataNode进行注册和心跳
actor.start();
}
}
//2.2.6.2 DataNode进行注册和心跳(BPServiceActor类)
//This must be called only by BPOfferService
void start() {
if ((bpThread != null) && (bpThread.isAlive())) {
//Thread is started already
return;
}
//2.2.6.3 传入的Runable target是this,说明BPServiceActor实现了Runnable接口
bpThread = new Thread(this, formatThreadName());
bpThread.setDaemon(true); // needed for JUnit testing
// 2.2.6.4 调用了Thread类的start方法, 关注类的run方法
bpThread.start();
}
// 2.2.6.3 果然BPServiceActor实现了Runnable接口
class BPServiceActor implements Runnable {}
// 2.2.6.4 BPServiceActor类的run方法
public void run() {
LOG.info(this + " starting to offer service");
//TODO 注册+心跳
try {
while (true) {
// init stuff
try {
// setup storage
//2.2.6.5 TODO 连接 NameNode并且实现握手
connectToNNAndHandshake();
break;
} catch (IOException ioe) {
// Initial handshake, storage recovery or registration failed
runningState = RunningState.INIT_FAILED;
if (shouldRetryInit()) {
// Retry until all namenode's of BPOS failed initialization
LOG.error("Initialization failed for " + this + " "
+ ioe.getLocalizedMessage());
// TODO 如果有问题睡眠5秒
sleepAndLogInterrupts(5000, "initializing");
} else {
runningState = RunningState.FAILED;
LOG.fatal("Initialization failed for " + this + ". Exiting. ", ioe);
return;
}
}
}
runningState = RunningState.RUNNING;
while (shouldRun()) {
try {
//周期性地发送心跳,默认是3秒一次
offerService();
} catch (Exception ex) {
LOG.error("Exception in BPOfferService for " + this, ex);
sleepAndLogInterrupts(5000, "offering service");
}
}
runningState = RunningState.EXITED;
} catch (Throwable ex) {
LOG.warn("Unexpected exception in block pool " + this, ex);
runningState = RunningState.FAILED;
} finally {
LOG.warn("Ending block pool service for: " + this);
cleanUp();
}
}
// 2.2.6.5 获取NameNode代理对象并注册(涉及两阶段握手)
// BPServiceActor类的connectToNNAndHandshake方法,注册
private void connectToNNAndHandshake() throws IOException {
//TODO get NN proxy 获取到NameNode的代理对象
bpNamenode = dn.connectToNN(nnAddr);
//与namenode进行第一阶段握手,获取命名空间信息
NamespaceInfo nsInfo = retrieveNamespaceInfo();
//验证命名空间信息
bpos.verifyAndSetNamespaceInfo(nsInfo);
// Second phase of the handshake with the NN.
// 发送第二阶段握手信息给NameNode,进行注册
// TODO 2.2.6.6 DataNode向NameNode发起注册
register(nsInfo);
}
// TODO 2.2.6.6 DataNode向NameNode发起注册
void register(NamespaceInfo nsInfo) throws IOException {
// The handshake() phase loaded the block pool storage
// off disk - so update the bpRegistration object from that info
// TODO 2.2.6.7 创建注册信息 内部封装了主机名/StorageInfo/DataNodeID等信息
bpRegistration = bpos.createRegistration();
LOG.info(this + " beginning handshake with NN");
while (shouldRun()) {
try {
//TODO bpNamenode是namenode的代理对象
//TODO 2.2.6.7 该方法由NameNodeRPCServer调用
bpRegistration = bpNamenode.registerDatanode(bpRegistration);
bpRegistration.setNamespaceInfo(nsInfo);
break;
} catch(EOFException e) { // namenode might have just restarted
LOG.info("Problem connecting to server: " + nnAddr + " :"
+ e.getLocalizedMessage());
sleepAndLogInterrupts(1000, "connecting to server");
} catch(SocketTimeoutException e) { // namenode is busy
LOG.info("Problem connecting to server: " + nnAddr);
sleepAndLogInterrupts(1000, "connecting to server");
}
}
LOG.info("Block pool " + this + " successfully registered with NN");
bpos.registrationSucceeded(this, bpRegistration);
// random short delay - helps scatter the BR from all DNs
scheduleBlockReport(dnConf.initialBlockReportDelay);
}
// 2.2.6.7 该方法由NameNodeRPCServer调用(NameNodeRPCServer.registerDatanode)
public DatanodeRegistration registerDatanode(DatanodeRegistration nodeReg)
throws IOException {
//检查NameNode是否已经启动
checkNNStartup();
verifySoftwareVersion(nodeReg); //软件版本校验
//TODO 2.2.6.8 注册DataNode
namesystem.registerDatanode(nodeReg);
return nodeReg;
}
//TODO 2.2.6.8 注册DataNode
void registerDatanode(DatanodeRegistration nodeReg) throws IOException {
writeLock();
try {
//从BlockManager中获取DatanodeManager,调用registerDatanode方法
//nodeReg 为datanode 传递过来的注册信息
// 2.2.6.9 registerDatanode
getBlockManager().getDatanodeManager().registerDatanode(nodeReg);
checkSafeMode();
} finally {
writeUnlock();
}
}
// 2.2.6.9 registerDatanode
public void registerDatanode(DatanodeRegistration nodeReg)
throws DisallowedDatanodeException, UnresolvedTopologyException {
InetAddress dnAddress = Server.getRemoteIp(); //获取远程注册的DataNode的ip和端口号
if (dnAddress != null) {
// Mostly called inside an RPC, update ip and peer hostname
String hostname = dnAddress.getHostName();
String ip = dnAddress.getHostAddress();
//...
nodeReg.setIpAddr(ip);
nodeReg.setPeerHostName(hostname);
}
//...
//1. 检查datanode节点运行的hdfs系统是否与namenode的是同一个版本号
//2. 检查dfs.hosts[运行被连接到namenode] 和 dfs.hosts.exclude[不允许被连接到namenode]
//3. 从datanodeMap[StorageID -> DatanodeDescriptor], 通过DataNode的uuid获取对应的DatanodeDescriptor,记为nodeS
DatanodeDescriptor nodeS = getDatanode(nodeReg.getDatanodeUuid());
//4. 从host2DatanodeMap /** Host names to datanode descriptors mapping. */ ,通过ip和端口得到DatanodeDescriptor,记为nodeN
DatanodeDescriptor nodeN = host2DatanodeMap.getDatanodeByXferAddr(nodeReg.getIpAddr(), nodeReg.getXferPort());
//5. 如果nodeN != null 并且 从datanodeMap与从host2DatanodeMap不同,但是元数据信息发生过改变,直接移除该datanode信息,并且把nodeN设置为null
if (nodeN != null && nodeN != nodeS) {
NameNode.LOG.info("BLOCK* registerDatanode: " + nodeN);
// nodeN previously served a different data storage,
// which is not served by anybody anymore.
removeDatanode(nodeN);
// physically remove node from datanodeMap
wipeDatanode(nodeN);
nodeN = null;
}
//6. 如果nodeS存在,则Datanode注册过,更新NetworkTopology,在NetworkTopology删除nodeS,更新nodeS,调用resolveNetworkLocation, 获得nodeS位置并更新
//7. 如果nodeS不存在[storageID]为空,说明这是一个新的datanode,分配一个全局的storageID, 创建DatanodeDescriptor
//8. FSNameSystem会根据DataNode的ip地址把它映射到合适的rack,构建StorageID到DataNodeDescriptor的映射得到新的nodeS
// 注册一个DataNode
addDatanode(nodeDescr);
// 进行心跳管理,后续HeartbeatMonitor后台线程监控DataNode节点是否存活
heartbeatManager.addDatanode(nodeDescr);
}
三、总结
3.1 DataNode在启动过程中涉及的类与对应关系
3.2 注册流程总结
- 创建一个DataNode对象
- 启动DataNode执行DataNode的初始化
- 初始化DataStorage
- 初始化DataXceiverServer
- 启动HttpServer服务并注册sevlet
- 初始化RPC服务,用于处理客户端和其他DataNode的请求
- 创建一个BlockPoolManager块池,遍历BPOfferService中的BPServiceActor对象的start方法,实际向集群里面的每个NameNode进行注册和心跳
- 获取NameNode代理NameNodeRpcServer,调用registerDatanode,在NameNodeRpcServer中获取BlockManager对象,BlockManager获取DatanodeManager调用registerDatanode,在NameNode内存中添加DataNode节点,并纳入心跳管理