从NameNode节点获取数据块所在节点等信息
客户端在和数据节点建立流式接口的TCP连接,读取文件数据前需要定位数据的位置,所以首先客户端在DFSClient.callGetBlockLocations()
方法中调用了远程方法ClientProtocol.getBlockLocations()
,调用该方法返回一个LocatedBlocks对象,包含了一系列的LocatedBlock实例,通过这些信息客户端就知道需要到哪些数据节点上去获取数据。这个方法会在NameNode.getBlockLocations()中调用,进而调用FSNamesystem.同名的来进行实际的调用过程,FSNamesystem有三个重载方法,代码如下:
LocatedBlocks getBlockLocations(String clientMachine, String src,
long offset, long length) throws IOException {
LocatedBlocks blocks = getBlockLocations(src, offset, length, true, true,
true);
if (blocks != null) {
//如果blocks不为空,那么就对数据块所在的数据节点进行排序
//sort the blocks
// In some deployment cases, cluster is with separation of task tracker
// and datanode which means client machines will not always be recognized
// as known data nodes, so here we should try to get node (but not
// datanode only) for locality based sort.
Node client = host2DataNodeMap.getDatanodeByHost(
clientMachine);
if (client == null) {
List<String> hosts = new ArrayList<String> (1);
hosts.add(clientMachine);
String rName = dnsToSwitchMapping.resolve(hosts).get(0);
if (rName != null)
client = new NodeBase(clientMachine, rName);
}
DFSUtil.StaleComparator comparator = null;
if (avoidStaleDataNodesForRead) {
comparator = new DFSUtil.StaleComparator(staleInterval);
}
// Note: the last block is also included and sorted
for (LocatedBlock b : blocks.getLocatedBlocks()) {
clusterMap.pseudoSortByDistance(client, b.getLocations());
if (avoidStaleDataNodesForRead) {
Arrays.sort(b.getLocations(), comparator);
}
}
}
return blocks;
}
/**
* Get block locations within the specified range.
* @see ClientProtocol#getBlockLocations(String, long, long)
*/
public LocatedBlocks getBlockLocations(String src, long offset, long length
) throws IOException {
return getBlockLocations(src, offset, length, false, true, true);
}
/**
* Get block locations within the specified range.
* @see ClientProtocol#getBlockLocations(String, long, long)
*/
public LocatedBlocks getBlockLocations(String src, long offset, long length,
boolean doAccessTime, boolean needBlockToken, boolean checkSafeMode)
throws IOException {
if (isPermissionEnabled) {
//读权限检查
FSPermissionChecker pc = getPermissionChecker();
checkPathAccess(pc, src, FsAction.READ);
}
if (offset < 0) {
throw new IOException("Negative offset is not supported. File: " + src );
}
if (length < 0) {
throw new IOException("Negative length is not supported. File: " + src );
}
final LocatedBlocks ret = getBlockLocationsInternal(src,
offset, length, Integer.MAX_VALUE, doAccessTime, needBlockToken);
if (auditLog.isInfoEnabled() && isExternalInvocation()) {
logAuditEvent(UserGroupInformation.getCurrentUser(),
Server.getRemoteIp(),
"open", src, null, null);
}
if (checkSafeMode && isInSafeMode()) {
for (LocatedBlock b : ret.getLocatedBlocks()) {
// if safemode & no block locations yet then throw safemodeException
if ((b.getLocations() == null) || (b.getLocations().length == 0)) {
throw new SafeModeException("Zero blocklocations for " + src,
safeMode);
}
}
}
return ret;
}
从上面的代码可以看出,前两个方法都是调用了第三个重载方法,第二个方法获取到数据块之后,还会根据客户端和获取到的节点列表进行”排序”,“排序”调用的方法是:
public void pseudoSortByDistance( Node reader, Node[] nodes ) {
int tempIndex = 0;
if (reader != null ) {
int localRackNode = -1;
//scan the array to find the local node & local rack node
for(int i=0; i<nodes.length; i++) {
//遍历nodes,看reader是否在nodes中
if(tempIndex == 0 && reader == nodes[i]) {
//local node
//swap the local node and the node at position 0
//第i个数据节点与客户端是一台机器
if( i != 0 ) {
swap(nodes, tempIndex, i);
}
tempIndex=1;
if(localRackNode != -1 ) {
if(localRackNode == 0) {
//localRackNode==0表示在没有交换之前,第0个节点是
//与reader位于同一机架上的节点,现在交换了,那么第i个就是与reader在同一机架上的节点
localRackNode = i;
}
break;//第0个是reader节点,第i个是与reader在同一机架上的节点,那么剩下的节点就一定在这个机架上,跳出循环
}
} else if(localRackNode == -1 && isOnSameRack(reader, nodes[i])) {
//local rack,节点i和Reader在同一个机架上
localRackNode = i;
if(tempIndex != 0 ) break;//tempIndex != 0表示reader在nodes中
}
}
//如果reader在nodes中,那么tempIndex==1,否则tempIndex = 0,如果localRackNode != 1,那么localRackNode节点就
//是与reader位于同一机架上的节点,交换localRackNode到tempIndex,这样如果reader在nodes中,localRackNode与reader
//在同一个机架上,那么第0个就是reader节点,第1个就是localRackNode节点,如果reader不在nodes中,
//localRackNode与reader在同一个机架上,那么第0个就是localRackNode节点,否则就随机找一个
if(localRackNode != -1 && localRackNode != tempIndex ) {
swap(nodes, tempIndex, localRackNode);
tempIndex++;
}
}
//tempIndex == 0,则在nodes中既没有reader,也没有与reader在同一机架上的节点
if(tempIndex == 0 && nodes.length != 0) {
swap(nodes, 0, r.nextInt(nodes.length));
}
}
“排序”的规则是如果reader节点在nodes节点列表中,那么将reader放在nodes的第0个位置,如果在nodes中有与reader在同一机架上的节点localRackNode,那么就将localRackNode节点放在reader后面(如果reader不在nodes中,可以将reader视作在nodes的第-1个位置),如果也不存在与reader在同一机架上的节点,那么就在nodes中随机选择一个节点放在第0个位置。
在FSNamesystem.getBlockLocations()的第三个重载方法中,调用了FSNamesystem.getBlockLocationsInternal()方法来具体处理充NameNode节点的目录树中到文件所对应的数据块,这个方法代码如下:
private synchronized LocatedBlocks getBlockLocationsInternal(String src,
long offset,
long length,
int nrBlocksToReturn,
boolean doAccessTime,
boolean needBlockToken)
throws IOException {
//获取src路径上最后一个节点即文件节点
INodeFile inode = dir.getFileINode(src);
if(inode == null) {
return null;
}
if (doAccessTime && isAccessTimeSupported()) {
//修改最后访问时间
dir.setTimes(src, inode, -1, now(), false);
}
//返回文件的数据块
Block[] blocks = inode.getBlocks();
if (blocks == null) {
r