DataNode源代码分析:
1.简介:DataNode是文件存储的基本单元,它将Block存储在本地文件系统中,保存了Block的Meta-data,
同时周期性地将所有存在的Block信息发送给NameNode
2.main启动DataNode
2.1:shell脚本启动DataNode
|-->hadoop/bin/start-all.sh
|-->start-dfs.sh
|-->"$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR start datanode $dataStartOpt
2.2:main()函数启动分析
|-->StringUtils.startupShutdownMessage(DataNode.class, args, LOG); |设置启动和关闭日志信息
|-->toStartupShutdownString()
|-->Runtime.getRuntime().addShutdownHook() |通过设置钩子,完成日志结束标志
|-->DataNode datanode = createDataNode(args, null); |见2.3
|-->datanode.join(); |主线程等待datanode线程执行完成
2.3 createDataNode(args, null) |用于创建Datanode实例,并启动Datanode线程
|-->DataNode dn = instantiateDataNode(args, conf);
|-->runDatanodeDaemon(dn);
2.3.1 instantiateDataNode(args, conf) |实例化DataNode结点
|-->parseArguments(args, conf) |根据args解析加载conf的参数值
|-->String[] dataDirs = conf.getStrings("dfs.data.dir"); |获取datanode的本地存储路径
|-->makeInstance(dataDirs, conf);
2.3.2 makeInstance(dataDirs, conf); |创建Datanode实例
|-->for (int i = 0; i < dataDirs.length; i++)
|-->dirs.add(data);
|-->return new DataNode(conf, dirs); |返回DataNode实例
2.3.3 runDatanodeDaemon(dn); |运行DataNode结点
|-->dn.register(); |向namenode发送注册信息,namenode会通过心跳机制传递命令给datanode
|-->dn.dataNodeThread = new Thread(dn, dnThreadName);
|-->dn.dataNodeThread.setDaemon(true);
|-->dn.dataNodeThread.start();
3.DataNode实例化,通过startDataNode(conf, dataDirs)进行实例化
|-->setMachineName |设置machineName
|-->machineName = conf.get("slave.host.name");
|-->machineName = DNS.getDefaultHost()
|-->nameNodeAddr = NameNode.getAddress(conf); |获取nameNode的地址信息
|-->setSocketout时间
|-->his.socketTimeout = conf.getInt("dfs.socket.timeout",HdfsConstants.READ_TIMEOUT);
|-->this.socketWriteTimeout = conf.getInt("dfs.datanode.socket.write.timeout",
HdfsConstants.WRITE_TIMEOUT);
|-->this.writePacketSize = conf.getInt("dfs.write.packet.size", 64*1024); |写包的大小,默认64K
|-->String address = NetUtils.getServerAddress( |设置地址
conf,
"dfs.datanode.bindAddress",
"dfs.datanode.port",
"dfs.datanode.address");)
|-->InetSocketAddress socAddr = NetUtils.createSocketAddr(address); |创建本地socketaddress地址
|-->int tmpPort = socAddr.getPort(); |端口号
|-->storage = new DataStorage(); |DataStorage保存了存储相关的信息
|-->this.dnRegistration = new DatanodeRegistration(machineName + ":" + tmpPort); |构造一个注册器
|-->this.namenode = (DatanodeProtocol) RPC.waitForProxy(); |通过动态代理生成namenode实例
|-->RPC.class中的getProxy()
|-->VersionedProtocol proxy = (VersionedProtocol) Proxy.newProxyInstance(
protocol.getClassLoader(), new Class[] { protocol },
new Invoker(addr, ticket, conf, factory));
|-->NamespaceInfo nsInfo = handshake(); |主要包含buildVersin和distributeUpgradeVersion,用于版本检验
|-->nsInfo = namenode.versionRequest();
|-->return namesystem.getNamespaceInfo();
|-->boolean simulatedFSDataset =
conf.getBoolean("dfs.datanode.simulateddatastorage", false);
|-->if (simulatedFSDataset) |判断一下是否是伪分布式,否则走正常判断,此处分析正常逻辑
|-->else
|-->storage.recoverTransitionRead(nsInfo, dataDirs, startOpt);
|-->this.dnRegistration.setStorageInfo(storage); |将storage进行信息注册
|-->this.data = new FSDataset(storage, conf); |根据storage和conf信息,生成FSDataset,用于数据块操作
|-->ServerSocket ss = (socketWriteTimeout > 0) ? |初始化Socket服务器端,区分NIO和IO
ServerSocketChannel.open().socket() : new ServerSocket();
|-->Server.bind(ss, socAddr, 0);
|-->ss.setReceiveBufferSize(DEFAULT_DATA_SOCKET_SIZE); |设置接收的buffer缓存大小,默认64K
|-->selfAddr = new InetSocketAddress(ss.getInetAddress().getHostAddress(),
tmpPort);
|-->this.dataXceiverServer = new Daemon(threadGroup, new DataXceiverServer |初始化处理类dataXceiverServer
(ss, conf, this));
|-->setInterval |分别设置块状态信息间隔时间和心跳间隔时间
|-->blockReportInterval
|-->heartBeatInterval
|-->blockScanner = new DataBlockScanner(this, (FSDataset)data, conf); |blockScanner用于定时对文件块进行扫描
|-->this.infoServer = new HttpServer("datanode", infoHost, tmpInfoPort, |创建HttpServer,内部用jetty实现,用于页面监控
tmpInfoPort == 0, conf);
|-->ipcServer = RPC.getServer(this, ipcAddr.getHostName(), ipcAddr.getPort(), |开启本地ipc服务,监听来自client和其它
conf.getInt("dfs.datanode.handler.count", 3), false, conf); datanode结点的指令信息
4.Datanode线程运行 run()方法
|-->dataXceiverServer.start(); |启动dataXceiverServer服务器
|-->new Daemon(datanode.threadGroup, |根据socket接送状态,启动DataXceiver,见4.1
new DataXceiver(s, datanode, this)).start();
|-->startDistributedUpgradeIfNeeded();
|-->offerService(); |与namenode完成心跳机制,并接受来自namenode的命令 ,见4.2
4.1 DataXceiver的run()方法
|-->in = new DataInputStream( |获取来自namenode结点的流信息
new BufferedInputStream(NetUtils.getInputStream(s),
SMALL_BUFFER_SIZE));
|-->short version = in.readShort(); |读取版本信息
|-->boolean local = s.getInetAddress().equals(s.getLocalAddress()) |判断是否本地地址
|-->byte op = in.readByte(); |获取命令指令,主要有以下几种
|-->DataTransferProtocol.OP_READ_BLOCK |读取block信息
|-->DataTransferProtocol.OP_WRITE_BLOCK: |写block信息
|-->DataTransferProtocol.OP_READ_METADATA: |读取元数据信息
|-->DataTransferProtocol.OP_REPLACE_BLOCK |替换块信息
|-->DataTransferProtocol.OP_COPY_BLOCK |复制块信息
|-->DataTransferProtocol.OP_BLOCK_CHECKSUM |较验block信息
4.1.1 .OP_READ_BLOCK -->readBlock(DataInputStream in) |读取数据块信息
|-->首先读取block描述信息
|-->long blockId = in.readLong();
|-->Block block = new Block( blockId, 0 , in.readLong());
|-->long startOffset = in.readLong();
|-->long length = in.readLong();
|-->String clientName = Text.readString(in); |Utf-9转码读取clientName信息
|-->创建输出流
|-->OutputStream baseStream = NetUtils.getOutputStream(s,
datanode.socketWriteTimeout);
|-->DataOutputStream out = new DataOutputStream(
new BufferedOutputStream(baseStream, SMALL_BUFFER_SIZE));
|-->blockSender = new BlockSender(block, startOffset, length,
true, true, false, datanode, clientTraceFmt);
|-->out.writeShort(DataTransferProtocol.OP_STATUS_SUCCESS);
|-->long read = blockSender.sendBlock(out, baseStream, null); |发送block数据
|-->if (blockSender.isBlockReadFully()) |如果读取的整个块信息,则需要校验块信息
|-->datanode.blockScanner.verifiedByClient(block);
|-->datanode.myMetrics.bytesRead.inc((int) read);
|-->datanode.myMetrics.blocksRead.inc();
|-->关闭相应流信息
|-->IOUtils.closeStream(out);
|-->IOUtils.closeStream(blockSender);
4.1.1.1 sendBlock(out, baseStream, null) |读取Block信息时,发送block数据流
|-->this.throttler = throttler; |设置调节器,用于调节流速度与带宽的关系
|-->写头信息
|-->checksum.writeHeader(out);
|-->out.writeLong( offset );
|-->out.flush();
|-->设置packetSize大小
|-->int pktSize = DataNode.PKT_HEADER_LEN + SIZE_OF_INTEGER; |初始化设置,并根据流性质,设定大小
|-->ByteBuffer pktBuf = ByteBuffer.allocate(pktSize);
|-->while (endOffset > offset) |循环读,直到读取完成
|-->long len = sendChunks(pktBuf, maxChunksPerPacket,
streamForSendChunks);
|-->out.writeInt(0); 设置0为标志位,读取完成
|-->return totalRead;
4.1.1.2 sendChunks() 一共分为三个部分
|-->1:较验数据
|-->设置packet头信息
|-->pkt.putInt(packetLen)
|-->pkt.putLong(offset)
|-->pkt.putLong(seqno);
|-->pkt.put((byte)
|-->pkt.putInt(len);
|-->checksumIn.readFully(buf, checksumOff, checksumLen);
|-->2:读取流信息
|-->int dataOff = checksumOff + checksumLen;
|-->IOUtils.readFully(blockIn, buf, dataOff, len); |从blockIn中读取block流信息
|-->for (int i=0; i<numChunks; i++) |针对每个checkSum的chunk块,进行较验
|-->checksum.update(buf, dOff, dLen);
|-->3:写流数据
|-->if (blockInPosition >= 0) |如果blockPosition大于0,则为socketOutputSteam流
|-->SocketOutputStream sockOut = (SocketOutputStream)out;
|-->sockOut.write(buf, 0, dataOff);
|-->sockOut.transferToFully(((FileInputStream)blockIn).getChannel(),
blockInPosition, len);
|-->else
|-->out.write(buf, 0, dataOff + len);
|-->throttler.throttle(packetLen); |调节带宽与传输流
|-->return len; |返回读取大小
4.1.1.3 writeBlock() |写block数据流,比读要复杂,涉及到与上下datanode节点的交互
1:读取头文件信息
|-->Block block = new Block(in.readLong(),
dataXceiverServer.estimateBlockSize, in.readLong());
|-->int pipelineSize = in.readInt();
|-->boolean isRecovery = in.readBoolean();
|-->String client = Text.readString(in)
|-->boolean hasSrcDataNode = in.readBoolean()
|-->srcDataNode.readFields(in); |此时为发送命令的datanode节点,srcDataNode
|-->int numTargets = in.readInt(); |共需要传递的节点数,最后一个节点就是1
|-->DatanodeInfo targets[] = new DatanodeInfo[numTargets];
|-->for (int i = 0; i < targets.length; i++) |从流当中读取DatanodeInfo信息
|-->tmp.readFields(in);
|-->targets[i] = tmp;
2:创建输入、输出流,及socket端口
|-->mirrorOut = new DataOutputStream( |创建下一节点的输出流
new BufferedOutputStream(
NetUtils.getOutputStream(mirrorSock, writeTimeout),
SMALL_BUFFER_SIZE));
|-->mirrorIn = new DataInputStream(NetUtils.getInputStream(mirrorSock));|创建下一节点的输入流
|-->replyOut = new DataOutputStream( |响应上一节点的输出流
NetUtils.getOutputStream(s, datanode.socketWriteTimeout));
|-->Socket mirrorSock |创建下一节点的端口号
|-->BlockReceiver blockReceiver = new BlockReceiver(block, in, |创建block接收者,并写block数据
s.getRemoteSocketAddress().toString(),
s.getLocalSocketAddress().toString(),
isRecovery, client, srcDataNode, datanode);
3.数据传递
|-->mirrorNode = targets[0].getName();
|-->mirrorTarget = NetUtils.createSocketAddr(mirrorNode);
|-->mirrorSock = datanode.newSocket();
|-->NetUtils.connect(mirrorSock, mirrorTarget, timeoutValue); |连接到下一节点datanode的客户端
|-->写下一节点输出流的版本等信息
|-->mirrorOut.writeShort( DataTransferProtocol.DATA_TRANSFER_VERSION );
|--> mirrorOut.write( DataTransferProtocol.OP_WRITE_BLOCK );
|-->mirrorOut.writeLong( block.getBlockId() );
|-->mirrorOut.writeLong( block.getGenerationStamp() );
|-->mirrorOut.writeInt( pipelineSize );
|-->mirrorOut.writeBoolean( isRecovery );
|-->Text.writeString( mirrorOut, client );
|-->mirrorOut.writeBoolean(hasSrcDataNode);
|-->srcDataNode.write(mirrorOut); |前提条件hasSrcDataNode
|-->mirrorOut.writeInt( targets.length - 1 );
|-->for ( int i = 1; i < targets.length; i++ )
|-->targets[i].write( mirrorOut );
|-->blockReceiver.writeChecksumHeader(mirrorOut); |写入检验头文件
|-->mirrorOut.flush();
|-->if (client.length() != 0)
|-->firstBadLink = Text.readString(mirrorIn); |当为client端的时候,读取ack信息
4.接收block数据及发送miorror镜像
|-->blockReceiver.receiveBlock(mirrorOut, mirrorIn, replyOut,
mirrorAddr, null, targets.length);
|-->datanode.notifyNamenodeReceivedBlock(block, DataNode.EMPTY_DEL_HINT);
|-->datanode.blockScanner.addBlock(block);
4.1:接收block信息 receiveBlock()
|-->BlockMetadataHeader.writeHeader(checksumOut, checksum);
|-->responder = new Daemon(datanode.threadGroup,
new PacketResponder(this, block, mirrIn,
replyOut, numTargets));
|-->responder.start();
|-->while (receivePacket() > 0) {} |接收流数据,写磁盘,每一次writeBlock只写一次磁盘
|-->mirrorOut.writeInt(0);
|-->((PacketResponder)responder.getRunnable()).close();
|-->if (clientName.length() == 0)
|-->block.setNumBytes(offsetInBlock);
|-->datanode.data.finalizeBlock(block);
4.2:receivePacket() |不断读packet数据至buf当中,循环至数据长度为o
|-->int payloadLen = readNextPacket(); |读取下一个packet,下述是处理和传输过程
|-->读取packet的头信息 ,然后回滚至最初位置
|-->buf.mark();
|-->buf.getInt()
|-->offsetInBlock = buf.getLong()
|-->long seqno = buf.getLong()
|-->lastPacketInBlock = (buf.get() != 0)
|-->int endOfHeader = buf.position(); |header头最后的位置
|-->buf.reset();
|-->setBlockPosition(offsetInBlock);
|-->写入下一DataNode节点镜像
|-->mirrorOut.write(buf.array(), buf.position(), buf.remaining());|整个Packet包往下传,
position和remaining确定包大小
|-->mirrorOut.flush(); |flush使之生效
|-->buf.position(endOfHeader); |从文件头处开始处理
|-->int len = buf.getInt(); |获取data的长度初始值
|-->offsetInBlock += len; |设置Block当中的offset值
|-->checksumLen = ((len + bytesPerChecksum - 1)/bytesPerChecksum)*checksumSize |获取checksumLen的长度
|-->int checksumOff = buf.position(); |此时bytebuffer的初始位置已经为真实的data数据
|-->int dataOff = checksumOff + checksumLen; |data数据的存储右端值
|-->byte pktBuf[] = buf.array();
|-->buf.position(buf.limit()); |移到数据data的末尾
|-->verifyChunks(pktBuf, dataOff, len, pktBuf, checksumOff); |验证chunk信息
|-->out.write(pktBuf, dataOff, len); |数据写本地磁盘
|-->验证chunk是否为packet
|-->partialCrc.update(pktBuf, dataOff, len);
|-->checksumOut.write(pktBuf, checksumOff, checksumLen);
|-->flush();
|-->checksumOut.flush()
|-->out.flush()
|-->responder.getRunnable()).enqueue(seqno,lastPacketInBlock) |用responder返回packet包
|-->throttler.throttle(payloadLen);
|-->return payloadLen;
5.关闭流及端口
|-->IOUtils.closeStream(mirrorOut);
|-->IOUtils.closeStream(mirrorIn);
|-->IOUtils.closeStream(replyOut);
|-->IOUtils.closeSocket(mirrorSock);
|-->IOUtils.closeStream(blockReceiver);
hadoop datanode源码分析
最新推荐文章于 2021-12-19 17:32:09 发布