datanode的代码位于hadoop-hdfs-project下的org.apache.hadoop.hdfs.server.datanode包中:首先找到该类,先看其注释,大概意思是:
datanode是block存储组件,与namenode通信,也与client端和其他datanode通信datanode管理一系列的block块,允许客户端去读写这些数据块。也会响应namenode对 block的删除,datanode间复制数据的请求。datanode维护着block的元数据信息,该信息存储于本地磁盘上,datanode在启动的时候会像namenode汇报本地block信息,并且在随后的时间里周期性的汇报信息给namenode。datanode在其生命周期内不断的像namenode发起请求,namenode不会主动联系datanode分配任务,二是通过心跳回复机制给datanode发送任务。datanode维护一个server socket以便处理client和其他datanode的请求,datanode会将这些host/port信息汇报给namenode。client和其他datanode从namenode获取该datanode的服务信息。
找到main方法:
public static void main(String args[]) {
if (DFSUtil.parseHelpArgument(args, DataNode.USAGE, System.out, true)) {
System.exit(0);
}
secureMain(args, null);
}
进入secureMain(args, null);方法:
public static void secureMain(String args[], SecureResources resources) {
int errorCode = 0;
try {
StringUtils.startupShutdownMessage(DataNode.class, args, LOG);
//创建datanode实例
DataNode datanode = createDataNode(args, null, resources);
if (datanode != null) {
//join各种线程 等待结束
datanode.join();
} else {
errorCode = 1;
}
} catch (Throwable e) {
LOG.error("Exception in secureMain", e);
terminate(1, e);
} finally {
// We need to terminate the process here because either shutdown was called
// or some disk related conditions like volumes tolerated or volumes required
// condition was not met. Also, In secure mode, control will go to Jsvc
// and Datanode process hangs if it does not exit.
LOG.warn("Exiting Datanode");
terminate(errorCode);
}
}
最重要的当然是DataNode datanode = createDataNode(args, null, resources);这个方法了,跟进方法:
public static DataNode createDataNode(String args[], Configuration conf,
SecureResources resources) throws IOException {
//初始化datanode
DataNode dn = instantiateDataNode(args, conf, resources);
if (dn != null) {
//启动datanode
dn.runDatanodeDaemon();
}
return dn;
}
这里显然两个重要步骤:
1.初始化datanode。
2.启动datanode。
先看第一步,初始化datanode
public static DataNode instantiateDataNode(String args [], Configuration conf,
SecureResources resources) throws IOException {
//加载datanode配置。
if (conf == null)
conf = new HdfsConfiguration();
if (args != null) {
// parse generic hadoop options
GenericOptionsParser hParser = new GenericOptionsParser(conf, args);
args = hParser.getRemainingArgs();
}
if (!parseArguments(args, conf)) {
printUsage(System.err);
return null;
}
//datanode数据存储的所有目录
Collection<StorageLocation> dataLocations = getStorageLocations(conf);
UserGroupInformation.setConfiguration(conf);
SecurityUtil.login(conf, DFS_DATANODE_KEYTAB_FILE_KEY,
DFS_DATANODE_KERBEROS_PRINCIPAL_KEY, getHostName(conf));
return makeInstance(dataLocations, conf, resources);
}
进入最后一句,return makeInstance(dataLocations, conf, resources);方法:
static DataNode makeInstance(Collection<StorageLocation> dataDirs,
Configuration conf, SecureResources resources) throws IOException {
List<StorageLocation> locations;
//初始化一些检查的配置参数,比如定义能够容忍多少失败的数量而不报严重错误
StorageLocationChecker storageLocationChecker =
new StorageLocationChecker(conf, new Timer());
try {
//节点检查,利用配置的阈值,来对数据目录做检查,返回健康的dir
locations = storageLocationChecker.check(conf, dataDirs);
} catch (InterruptedException ie) {
//检查失败则启动失败 抛出异常
throw new IOException("Failed to instantiate DataNode", ie);
}
//初始化metrix系统
DefaultMetricsSystem.initialize("DataNode");
assert locations.size() > 0 : "number of data directories should be > 0";
//初始化构造datanode对象
return new DataNode(conf, locations, storageLocationChecker, resources);
}
这段主要先做一些datanode目录的检查,失败则直接启动失败,检查通过则接着进入最后一句,构造datanode对象:return new DataNode(conf, locations, storageLocationChecker, resources);
/**
* Create the DataNode given a configuration, an array of dataDirs,
* and a namenode proxy.
*/
DataNode(final Configuration conf,
final List<StorageLocation> dataDirs,
final StorageLocationChecker storageLocationChecker,
final SecureResources resources) throws IOException {
//初始化配置,创建configuration对象,将配置文件读入到内存
super(conf);
this.tracer = createTracer(conf);
this.tracerConfigurationManager =
new TracerConfigurationManager(DATANODE_HTRACE_PREFIX, conf);
this.fileIoProvider = new FileIoProvider(conf, this);
this.blockScanner = new BlockScanner(this);
this.lastDiskErrorCheck = 0;
this.maxNumberOfBlocksToLog = conf.getLong(DFS_MAX_NUM_BLOCKS_TO_LOG_KEY,
DFS_MAX_NUM_BLOCKS_TO_LOG_DEFAULT);
this.usersWithLocalPathAccess = Arrays.asList(
conf.getTrimmedStrings(DFSConfigKeys.DFS_BLOCK_LOCAL_PATH_ACCESS_USER_KEY));
this.connectToDnViaHostname = conf.getBoolean(
DFSConfigKeys.DFS_DATANODE_USE_DN_HOSTNAME,
DFSConfigKeys.DFS_DATANODE_USE_DN_HOSTNAME_DEFAULT);
this.supergroup = conf.get(DFSConfigKeys.DFS_PERMISSIONS_SUPERUSERGROUP_KEY,
DFSConfigKeys.DFS_PERMISSIONS_SUPERUSERGROUP_DEFAULT);
this.isPermissionEnabled = conf.getBoolean(
DFSConfigKeys.DFS_PERMISSIONS_ENABLED_KEY,
DFSConfigKeys.DFS_PERMISSIONS_ENABLED_DEFAULT);
this.pipelineSupportECN = conf.getBoolean(
DFSConfigKeys.DFS_PIPELINE_ECN_ENABLED,
DFSConfigKeys.DFS_PIPELINE_ECN_ENABLED_DEFAULT);
confVersion = "core-" +
conf.get("hadoop.common.configuration.version", "UNSPECIFIED") +
",hdfs-" +
conf.get("hadoop.hdfs.configuration.version", "UNSPECIFIED");
this.volumeChecker = new DatasetVolumeChecker(conf, new Timer());
//初始化xferService线程池
this.xferService =
HadoopExecutors.newCachedThreadPool(new Daemon.DaemonFactory());
// Determine whether we should try to pass file descriptors to clients.
if (conf.getBoolean(HdfsClientConfigKeys.Read.ShortCircuit.KEY,
HdfsClientConfigKeys.Read.ShortCircuit.DEFAULT)) {
String reason = DomainSocket.getLoadingFailureReason();
if (reason != null) {
LOG.warn("File descriptor passing is disabled because {}", reason);
this.fileDescriptorPassingDisabledReason = reason;
} else {
LOG.info("File descriptor passing is enabled.");
this.fileDescriptorPassingDisabledReason = null;
}
} else {
this.fileDescriptorPassingDisabledReason =
"File descriptor passing was not configured.";
LOG.debug(this.fileDescriptorPassingDisabledReason);
}
this.socketFactory = NetUtils.getDefaultSocketFactory(conf);
try {
hostName = getHostName(conf);
LOG.info("Configured hostname is {}", hostName);
//初始或开启datanode多个组件
startDataNode(dataDirs, resources);
} catch (IOException ie) {
shutdown();
throw ie;
}
final int dncCacheMaxSize =
conf.getInt(DFS_DATANODE_NETWORK_COUNTS_CACHE_MAX_SIZE_KEY,
DFS_DATANODE_NETWORK_COUNTS_CACHE_MAX_SIZE_DEFAULT) ;
datanodeNetworkCounts =
CacheBuilder.newBuilder()
.maximumSize(dncCacheMaxSize)
.build(new CacheLoader<String, Map<String, Long>>() {
@Override
public Map<String, Long> load(String key) throws Exception {
final Map<String, Long> ret = new HashMap<String, Long>();
ret.put("networkErrors", 0L);
return ret;
}
});
initOOBTimeout();
this.storageLocationChecker = storageLocationChecker;
}
这里最重要的是方法:startDataNode(dataDirs, resources);
void startDataNode(List<StorageLocation> dataDirectories,
SecureResources resources
) throws IOException {
// settings global for all BPs in the Data Node
this.secureResources = resources;
synchronized (this) {
this.dataDirs = dataDirectories;
}
this.dnConf = new DNConf(this);
checkSecureConfig(dnConf, getConf(), resources);
if (dnConf.maxLockedMemory > 0) {
if (!NativeIO.POSIX.getCacheManipulator().verifyCanMlock()) {
throw new RuntimeException(String.format(
"Cannot start datanode because the configured max locked memory" +
" size (%s) is greater than zero and native code is not available.",
DFS_DATANODE_MAX_LOCKED_MEMORY_KEY));
}
if (Path.WINDOWS) {
NativeIO.Windows.extendWorkingSetSize(dnConf.maxLockedMemory);
} else {
long ulimit = NativeIO.POSIX.getCacheManipulator().getMemlockLimit();
if (dnConf.maxLockedMemory > ulimit) {
throw new RuntimeException(String.format(
"Cannot start datanode because the configured max locked memory" +
" size (%s) of %d bytes is more than the datanode's available" +
" RLIMIT_MEMLOCK ulimit of %d bytes.",
DFS_DATANODE_MAX_LOCKED_MEMORY_KEY,
dnConf.maxLockedMemory,
ulimit));
}
}
}
LOG.info("Starting DataNode with maxLockedMemory = {}",
dnConf.maxLockedMemory);
int volFailuresTolerated = dnConf.getVolFailuresTolerated();
int volsConfigured = dnConf.getVolsConfigured();
if (volFailuresTolerated < MAX_VOLUME_FAILURE_TOLERATED_LIMIT
|| volFailuresTolerated >= volsConfigured) {
throw new HadoopIllegalArgumentException("Invalid value configured for "
+ "dfs.datanode.failed.volumes.tolerated - " + volFailuresTolerated
+ ". Value configured is either less than -1 or >= "
+ "to the number of configured volumes (" + volsConfigured + ").");
}
//构建DataStorage
storage = new DataStorage();
// global DN settings
registerMXBean();
//初始化DataXceiver
initDataXceiver();
//开启InfoServer
startInfoServer();
pauseMonitor = new JvmPauseMonitor();
pauseMonitor.init(getConf());
pauseMonitor.start();
// BlockPoolTokenSecretManager is required to create ipc server.
this.blockPoolTokenSecretManager = new BlockPoolTokenSecretManager();
// Login is done by now. Set the DN user name.
dnUserName = UserGroupInformation.getCurrentUser().getUserName();
LOG.info("dnUserName = {}", dnUserName);
LOG.info("supergroup = {}", supergroup);
//初始化IPCSERVER
initIpcServer();
metrics = DataNodeMetrics.create(getConf(), getDisplayName());
peerMetrics = dnConf.peerStatsEnabled ?
DataNodePeerMetrics.create(getDisplayName(), getConf()) : null;
metrics.getJvmMetrics().setPauseMonitor(pauseMonitor);
ecWorker = new ErasureCodingWorker(getConf(), this);
blockRecoveryWorker = new BlockRecoveryWorker(this);
//初始化BlockPoolManager
blockPoolManager = new BlockPoolManager(this);
blockPoolManager.refreshNamenodes(getConf());
// Create the ReadaheadPool from the DataNode context so we can
// exit without having to explicitly shutdown its thread pool.
readaheadPool = ReadaheadPool.getInstance();
saslClient = new SaslDataTransferClient(dnConf.getConf(),
dnConf.saslPropsResolver, dnConf.trustedChannelResolver);
saslServer = new SaslDataTransferServer(dnConf, blockPoolTokenSecretManager);
startMetricsLogger();
if (dnConf.diskStatsEnabled) {
diskMetrics = new DataNodeDiskMetrics(this,
dnConf.outliersReportIntervalMs);
}
}
startDataNode方法有多个重要的操作:
1.构建DataStorage:storage = new DataStorage();
2.初始化DataXceiver:initDataXceiver();
3.开启InfoServer:startInfoServer();
4.初始化IPCSERVER:initIpcServer();
5.启动与namenode通信线程bpofferService:blockPoolManager.refreshNamenodes(conf);
1.storage = new DataStorage();
DataStorage() {
super(NodeType.DATA_NODE);
trashEnabledBpids = Collections.newSetFromMap(
new ConcurrentHashMap<String, Boolean>());
}
后面通过一些重载方法,初始化一些变量:
public StorageInfo(int layoutV, int nsID, String cid, long cT, NodeType type) {
layoutVersion = layoutV;
clusterID = cid;
namespaceID = nsID;
cTime = cT;
storageType = type;
}
2.initDataXceiver()
private void initDataXceiver(Configuration conf) throws IOException {
// find free port or use privileged port provided
// 找一个自由端口或使用已提供的特权端
// 构造TcpPeerServer实例tcpPeerServer,它实现了PeerServer接口,提供了ServerSocket的功能
TcpPeerServer tcpPeerServer;
if (secureResources != null) {// 如果secureResources存在,根据secureResources创建tcpPeerServer
tcpPeerServer = new TcpPeerServer(secureResources);
} else {// 否则,根据配置信息创建tcpPeerServer
tcpPeerServer = new TcpPeerServer(dnConf.socketWriteTimeout,
DataNode.getStreamingAddr(conf));
}
// 设置数据接收缓冲区大小,默认为128KB
tcpPeerServer.setReceiveBufferSize(HdfsConstants.DEFAULT_DATA_SOCKET_SIZE);
// 获取Socket地址InetSocketAddress,赋值给DataNode成员变量streamingAddr
streamingAddr = tcpPeerServer.getStreamingAddr();
LOG.info("Opened streaming server at " + streamingAddr);
// 构造名字为dataXceiverServer的线程组threadGroup
this.threadGroup = new ThreadGroup("dataXceiverServer");
// 构造DataXceiverServer实例xserver,传入tcpPeerServer
xserver = new DataXceiverServer(tcpPeerServer, conf, this);
// 构造dataXceiverServer守护线程,并将xserver加入线程组threadGroup
this.dataXceiverServer = new Daemon(threadGroup, xserver);
// 将线程组里的所有线程设置为设置为守护线程,方便虚拟机退出时自动销毁
this.threadGroup.setDaemon(true); // auto destroy when empty
// 如果系统配置的参数dfs.client.read.shortcircuit为true(默认为false),
// 或者配置的参数dfs.client.domain.socket.data.traffic为true(默认为false),
//
if (conf.getBoolean(DFSConfigKeys.DFS_CLIENT_READ_SHORTCIRCUIT_KEY,
DFSConfigKeys.DFS_CLIENT_READ_SHORTCIRCUIT_DEFAULT) ||
conf.getBoolean(DFSConfigKeys.DFS_CLIENT_DOMAIN_SOCKET_DATA_TRAFFIC,
DFSConfigKeys.DFS_CLIENT_DOMAIN_SOCKET_DATA_TRAFFIC_DEFAULT)) {
DomainPeerServer domainPeerServer =
getDomainPeerServer(conf, streamingAddr.getPort());
if (domainPeerServer != null) {
this.localDataXceiverServer = new Daemon(threadGroup,
new DataXceiverServer(domainPeerServer, conf, this));
LOG.info("Listening on UNIX domain socket: " +
domainPeerServer.getBindPath());
}
}
// 构造短路注册实例
this.shortCircuitRegistry = new ShortCircuitRegistry(conf);
}
DataXceiverServer是数据节点DataNode上一个用于接收数据读写请求的后台工作线程,为每个数据读写请求创建一个单独的线程去处理。
/**
* Server used for receiving/sending a block of data.
* This is created to listen for requests from clients or
* other DataNodes. This small server does not use the
* Hadoop IPC mechanism.
*/
class DataXceiverServer implements Runnable
DataXceiverServer是个线程,首先看它的成员变量:
// PeerServer是一个接口,实现了它的TcpPeerServer封装了一个ServerSocket,提供了Java Socket服务端的功能
private final PeerServer peerServer;
// 该DataXceiverServer所属DataNode实例datanode
private final DataNode datanode;
// Peer所在线程的映射集合peers
private final HashMap<Peer, Thread> peers = new HashMap<Peer, Thread>();
// Peer与DataXceiver的映射集合peersXceiver
private final HashMap<Peer, DataXceiver> peersXceiver = new HashMap<Peer, DataXceiver>();
// DataXceiverServer是否已关闭的标志位closed
private boolean closed = false;
/**
* Maximal number of concurrent xceivers per node.
* Enforcing the limit is required in order to avoid data-node
* running out of memory.
*
* 每个节点并行的最大DataXceivers数目。
* 为了避免dataNode运行内存溢出,执行这个限制是必须的。
* 定义是默认值为4096.
*/
int maxXceiverCount =
DFSConfigKeys.DFS_DATANODE_MAX_RECEIVER_THREADS_DEFAULT;
// 集群数据块平衡节流器balanceThrottler
final BlockBalanceThrottler balanceThrottler;
/**
* 我们需要估计块大小以检测磁盘分区是否有足够的空间。
* 新客户端传递预期块大小给DataNode。
* 对于旧客户端而言我们仅仅使用服务器端默认的块大小。
*/
final long estimateBlockSize;
PeerServer类型的peerServer,实际上是DataXceiverServer实现功能最重要的一个类,在DataXceiverServer实例构造时,实际上传入的是实现了PeerServer接口的TcpPeerServer类,该类内部封装了一个ServerSocket,提供了Java Socket服务端的功能,用于监听来自客户端或其他DataNode的数据读写请求。DataXceiverServer内部还存在对于其载体DataNode的实例datanode,这样该线程就能随时获得DataNode状态、提供的一些列服务等; peers和peersXceiver是DataXceiverServer内部关于peer的两个数据结构,一个是Peer与其所在线程映射集合peers,另一个则是Peer与DataXceiver的映射集合peersXceiver,均是HashMap类型。Peer是什么呢?实际上就是对Socket的封装; closed为DataXceiverServer是否已关闭的标志位; maxXceiverCount为每个DataNode节点并行的最大DataXceivers数目,为了避免dataNode运行内存溢出,执行这个限制是必须的;balanceThrottler是DataXceiverServer内部一个关于集群中数据库平衡的节流器的实现,它实现了对于数据块移动时带宽、数量的控制。
再来看看他的构造方法:
DataXceiverServer(PeerServer peerServer, Configuration conf,
DataNode datanode) {
this.peerServer = peerServer;
this.datanode = datanode;
// 设置DataNode中DataXceiver的最大数目maxXceiverCount
// 取参数dfs.datanode.max.transfer.threads,参数未配置的话,默认值为4096
this.maxXceiverCount =
conf.getInt(DFSConfigKeys.DFS_DATANODE_MAX_RECEIVER_THREADS_KEY,
DFSConfigKeys.DFS_DATANODE_MAX_RECEIVER_THREADS_DEFAULT);
//估计块大小:128*1024*1024 默认128M
this.estimateBlockSize = conf.getLongBytes(DFSConfigKeys.DFS_BLOCK_SIZE_KEY,
DFSConfigKeys.DFS_BLOCK_SIZE_DEFAULT);
//set up parameter for cluster balancing
// 设置集群平衡节流器
// 带宽取参数dfs.datanode.balance.bandwidthPerSec,参数未配置默认为1024*1024
// 最大线程数取参数dfs.datanode.balance.max.concurrent.moves,参数未配置默认为5
this.balanceThrottler = new BlockBalanceThrottler(
conf.getLong(DFSConfigKeys.DFS_DATANODE_BALANCE_BANDWIDTHPERSEC_KEY,
DFSConfigKeys.DFS_DATANODE_BALANCE_BANDWIDTHPERSEC_DEFAULT),
conf.getInt(DFSConfigKeys.DFS_DATANODE_BALANCE_MAX_NUM_CONCURRENT_MOVES_KEY,
DFSConfigKeys.DFS_DATANODE_BALANCE_MAX_NUM_CONCURRENT_MOVES_DEFAULT));
}
上面可以看到,DataXceiverServer实现了Runnable接口,那么它真正执行的代码就在run方法里:
@Override
public void run() {
Peer peer = null;
//shouldRun:datanode运行状态标志位,volatile修饰保证可见性
//shutdownForUpgrade:标志位,datanodeshutdown的时候会被标记为false
while (datanode.shouldRun && !datanode.shutdownForUpgrade) {
try {
//阻塞方法,等待客户端或者其他datanode的连接请求
peer = peerServer.accept();
// Make sure the xceiver count is not exceeded
//获取线程组里的线程个数,不能超过参数dfs.datanode.max.transfer.threads的值,默认4096
int curXceiverCount = datanode.getXceiverCount();
if (curXceiverCount > maxXceiverCount) {
throw new IOException("Xceiver count " + curXceiverCount
+ " exceeds the limit of concurrent xcievers: "
+ maxXceiverCount);
}
//启动一个DataXceiver线程
new Daemon(datanode.threadGroup,
DataXceiver.create(peer, datanode, this))
.start();
} catch (SocketTimeoutException ignored) {
// wake up to see if should continue to run
} catch (AsynchronousCloseException ace) {
// another thread closed our listener socket - that's expected during shutdown,
// but not in other circumstances
if (datanode.shouldRun && !datanode.shutdownForUpgrade) {
LOG.warn(datanode.getDisplayName() + ":DataXceiverServer: ", ace);
}
} catch (IOException ie) {
IOUtils.cleanup(null, peer);
LOG.warn(datanode.getDisplayName() + ":DataXceiverServer: ", ie);
} catch (OutOfMemoryError ie) {
IOUtils.cleanup(null, peer);
// DataNode can run out of memory if there is too many transfers.
// Log the event, Sleep for 30 seconds, other transfers may complete by
// then.
// 数据节点可能由于存在太多的数据传输导致内存溢出,记录该事件,并等待30秒,其他的数据传输可能到时就完成了
LOG.warn("DataNode is out of memory. Will retry in 30 seconds.", ie);
try {
Thread.sleep(30 * 1000);
} catch (InterruptedException e) {
// ignore
}
} catch (Throwable te) {
LOG.error(datanode.getDisplayName()
+ ":DataXceiverServer: Exiting due to: ", te);
datanode.shouldRun = false;
}
}
// Close the server to stop reception of more requests.
try {
peerServer.close();
closed = true;
} catch (IOException ie) {
LOG.warn(datanode.getDisplayName()
+ " :DataXceiverServer: close exception", ie);
}
// if in restart prep stage, notify peers before closing them.
if (datanode.shutdownForUpgrade) {
restartNotifyPeers();
// Each thread needs some time to process it. If a thread needs
// to send an OOB message to the client, but blocked on network for
// long time, we need to force its termination.
LOG.info("Shutting down DataXceiverServer before restart");
// Allow roughly up to 2 seconds.
for (int i = 0; getNumPeers() > 0 && i < 10; i++) {
try {
Thread.sleep(200);
} catch (InterruptedException e) {
// ignore
}
}
}
// Close all peers.
closeAllPeers();
}
这里大概意思就是启动一个阻塞的服务,来等待请求的连接,一旦接收到客户端的请求,就会启动一个DataXceiver线程来处理该请求。
3.startInfoServer
这里主要是构造一个HttpServer2服务,然后将对应的servlet映射到请求路径,这里不是特别重要,就不详细分析了
4.初始化IPCSERVER:initIpcServer();
private void initIpcServer(Configuration conf) throws IOException {
InetSocketAddress ipcAddr = NetUtils.createSocketAddr(
conf.get(DFS_DATANODE_IPC_ADDRESS_KEY));
// Add all the RPC protocols that the Datanode implements
RPC.setProtocolEngine(conf, ClientDatanodeProtocolPB.class,
ProtobufRpcEngine.class);
ClientDatanodeProtocolServerSideTranslatorPB clientDatanodeProtocolXlator =
new ClientDatanodeProtocolServerSideTranslatorPB(this);
BlockingService service = ClientDatanodeProtocolService
.newReflectiveBlockingService(clientDatanodeProtocolXlator);
ipcServer = new RPC.Builder(conf)
.setProtocol(ClientDatanodeProtocolPB.class)
.setInstance(service)
.setBindAddress(ipcAddr.getHostName())
.setPort(ipcAddr.getPort())
.setNumHandlers(
conf.getInt(DFS_DATANODE_HANDLER_COUNT_KEY,
DFS_DATANODE_HANDLER_COUNT_DEFAULT)).setVerbose(false)
.setSecretManager(blockPoolTokenSecretManager).build();
InterDatanodeProtocolServerSideTranslatorPB interDatanodeProtocolXlator =
new InterDatanodeProtocolServerSideTranslatorPB(this);
service = InterDatanodeProtocolService
.newReflectiveBlockingService(interDatanodeProtocolXlator);
DFSUtil.addPBProtocol(conf, InterDatanodeProtocolPB.class, service,
ipcServer);
TraceAdminProtocolServerSideTranslatorPB traceAdminXlator =
new TraceAdminProtocolServerSideTranslatorPB(this);
BlockingService traceAdminService = TraceAdminService
.newReflectiveBlockingService(traceAdminXlator);
DFSUtil.addPBProtocol(conf, TraceAdminProtocolPB.class, traceAdminService,
ipcServer);
LOG.info("Opened IPC server at " + ipcServer.getListenerAddress());
// set service-level authorization security policy
if (conf.getBoolean(
CommonConfigurationKeys.HADOOP_SECURITY_AUTHORIZATION, false)) {
ipcServer.refreshServiceAcl(conf, new HDFSPolicyProvider());
}
}
与namenode类似,datanode也需要实现一些rpc的功能。这里主要是初始化两个RPC协议:ClientDatanodeProtocolPB和InterDatanodeProtocolPB;从名字上可以看出,分别是实现client端与datanode通信和datanode之间通信的协议。
5.blockPoolManager.refreshNamenodes(conf);
这里调用的是BlockPoolManager类的refresh方法,首先看BlockPoolManager的注释:
管理datanode中的BPOfferService对象,对于BPOfferService对象的创建,删除,开启,停止,都需要经过该类来操作。
也就是说blockPoolManager主要是在管理BPOfferService对象,这里提到了BPOfferService对象,先来看看该对象的注释:
在每个datanode上都会有一个或多个BPOfferService实例,该BPOfferService实例管理着该datanode需要发送心跳的active和standby的namenode,BPOfferService内部持有BPServiceActor实例对象,BPServiceActor是个线程,每个BPServiceActor对应着一个namenode,该namenode或者是active或者是standby状态,BPOfferService也维护着namenode的主备状态的切换。
上面说的datanode中的多个BPOfferService是指在联邦体系中,每一对namenode(standby和active)会对应一个BPOFFERSERVICE。大概了解了BlockPoolManager类和BPOfferService对象后,回过头来看refresh方法:
void refreshNamenodes(Configuration conf)
throws IOException {
LOG.info("Refresh request received for nameservices: " + conf.get
(DFSConfigKeys.DFS_NAMESERVICES));
Map<String, Map<String, InetSocketAddress>> newAddressMap = DFSUtil
.getNNServiceRpcAddressesForCluster(conf);
synchronized (refreshNamenodesLock) {
doRefreshNamenodes(newAddressMap);
}
}
对于Map<String, Map<String, InetSocketAddress>> newAddressMap = DFSUtil .getNNServiceRpcAddressesForCluster(conf);
该代码用于获取datanode对应的namenode的地址的一个数据结构,获取过程比较繁琐,大意是从各种配置中获取这些地址,最后的结果Map<String, Map<String, InetSocketAddress>> 中的结构是:
Map<namespaceid, Map<namenodeid, namode的端口地址信息>> ,获取了该信息后将其传给doRefreshNamenodes方法:
private void doRefreshNamenodes(
Map<String, Map<String, InetSocketAddress>> addrMap) throws IOException {
//首先判断该线程拥有这把锁
assert Thread.holdsLock(refreshNamenodesLock);
//初始化三个set,分别存放需要刷新,添加,删除的
Set<String> toRefresh = Sets.newLinkedHashSet();
Set<String> toAdd = Sets.newLinkedHashSet();
Set<String> toRemove;
synchronized (this) {
// Step 1. For each of the new nameservices, figure out whether
// it's an update of the set of NNs for an existing NS,
// or an entirely new nameservice.
//对于配置文件里有的nameservice,并且bpByNameserviceId里面也有的bpByNameservice,就放入刷新列表里
for (String nameserviceId : addrMap.keySet()) {
if (bpByNameserviceId.containsKey(nameserviceId)) {
toRefresh.add(nameserviceId);
} else {
toAdd.add(nameserviceId);
}
}
// Step 2. Any nameservices we currently have but are no longer present
// need to be removed.
//删除bpByNameserviceId中存在,而配置信息addrMap中没有的
toRemove = Sets.newHashSet(Sets.difference(
bpByNameserviceId.keySet(), addrMap.keySet()));
assert toRefresh.size() + toAdd.size() ==
addrMap.size() :
"toAdd: " + Joiner.on(",").useForNull("<default>").join(toAdd) +
" toRemove: " + Joiner.on(",").useForNull("<default>").join(toRemove) +
" toRefresh: " + Joiner.on(",").useForNull("<default>").join(toRefresh);
// Step 3. Start new nameservices
if (!toAdd.isEmpty()) {
LOG.info("Starting BPOfferServices for nameservices: " +
Joiner.on(",").useForNull("<default>").join(toAdd));
//对于每一对namenode,就创建一个BPOfferService对象,并将该对象加入到bpByNameserviceId这个map数据结构中
for (String nsToAdd : toAdd) {
ArrayList<InetSocketAddress> addrs =
Lists.newArrayList(addrMap.get(nsToAdd).values());
BPOfferService bpos = createBPOS(addrs);
bpByNameserviceId.put(nsToAdd, bpos);
//offerServices存储了该datanode对应的所有BPOfferService对象
offerServices.add(bpos);
}
}
//开启所有BPOfferService里的BPServiceActor对线程
startAll();
}
// Step 4. Shut down old nameservices. This happens outside
// of the synchronized(this) lock since they need to call
// back to .remove() from another thread
//删除所有老的nameservices
if (!toRemove.isEmpty()) {
LOG.info("Stopping BPOfferServices for nameservices: " +
Joiner.on(",").useForNull("<default>").join(toRemove));
for (String nsToRemove : toRemove) {
BPOfferService bpos = bpByNameserviceId.get(nsToRemove);
bpos.stop();
bpos.join();
// they will call remove on their own
}
}
// Step 5. Update nameservices whose NN list has changed
//当namenode列表发生变化时,更新nameservices
if (!toRefresh.isEmpty()) {
LOG.info("Refreshing list of NNs for nameservices: " +
Joiner.on(",").useForNull("<default>").join(toRefresh));
for (String nsToRefresh : toRefresh) {
BPOfferService bpos = bpByNameserviceId.get(nsToRefresh);
ArrayList<InetSocketAddress> addrs =
Lists.newArrayList(addrMap.get(nsToRefresh).values());
bpos.refreshNNList(addrs);
}
}
}
该方法主要是对blockpoolmanager中的记录的namenode对应的一些内存中的信息做更新,记录该dn对应的所有namenode对,启动BpserviceActor线程与每一个namenode通信。
到此startDataNode方法执行完成。回到createDataNode(String args[], Configuration conf,SecureResources resources) 方法,该方法还有个dn.runDatanodeDaemon();需要执行:
public void runDatanodeDaemon() throws IOException {
blockPoolManager.startAll();
// start dataXceiveServer
dataXceiverServer.start();
if (localDataXceiverServer != null) {
localDataXceiverServer.start();
}
ipcServer.start();
startPlugins(conf);
}
首先,blockPoolManager.startAll();方法,该方法最终调用的是上面的BpserviceActor线程,实际上,这些线程在上面的步骤中已经启动,这里实际是多余的。
再来看dataXceiverServer.start();方法,这个方法上面也大概分析过,用来接收客户端或者其他datanode的数据传输请求,上面已经做了初始化,这里启动一下。最后IPCserver也是上面的步骤初始化完成后,这里只是做了一个启动操作。
到此,datanode的启动基本完成。总结一下:
main方法:
->secureMain(args, null);
->createDataNode(args, null, resources);
->instantiateDataNode(args, conf, resources);初始化一些配置及线程
->makeInstance(dataLocations, conf, resources);
->new DataNode(conf, locations, resources);
->startDataNode(conf, dataDirs, resources);
->storage = new DataStorage();
->registerMXBean();
->initDataXceiver(conf);
->startInfoServer(conf);
->pauseMonitor.start();
->blockPoolManager.refreshNamenodes(conf);
->doRefreshNamenodes
->启动BpserviceActor线程
->dn.runDatanodeDaemon();启动dataXceiverServer,IPCserver