我们之前讲过FSDataInputStream类中的read函数(总共有四篇,网址分别是read(1)、read(2)、read(3)、read(4)),这些read函数会调用DFSInputStream类中的相应的read函数,在DFSInputStream类的read函数分别有:
//第一个read函数
public synchronized int read()
//第二个read函数
public synchronized int read(final byte buf[], int off, int len)
//第三个read函数
public synchronized int read(final ByteBuffer buf)
//第四个read函数
public int read(long position, byte[] buffer, int offset, int length)
//第五个read函数
public synchronized ByteBuffer read(ByteBufferPool bufferPool,int maxLength, EnumSet<ReadOption> opts)
这5个read函数都会创建BlockReaderFactory类对象,并执行该对象的build函数,这个函数代码如下:
/**
* Build a BlockReader with the given options.
*
* This function will do the best it can to create a block reader that meets
* all of our requirements. We prefer short-circuit block readers
* (BlockReaderLocal and BlockReaderLocalLegacy) over remote ones, since the
* former avoid the overhead of socket communication. If short-circuit is
* unavailable, our next fallback is data transfer over UNIX domain sockets,
* if dfs.client.domain.socket.data.traffic has been enabled. If that doesn't
* work, we will try to create a remote block reader that operates over TCP
* sockets.
*
* There are a few caches that are important here.
*
* The ShortCircuitCache stores file descriptor objects which have been passed
* from the DataNode.
*
* The DomainSocketFactory stores information about UNIX domain socket paths
* that we not been able to use in the past, so that we don't waste time
* retrying them over and over. (Like all the caches, it does have a timeout,
* though.)
*
* The PeerCache stores peers that we have used in the past. If we can reuse
* one of these peers, we avoid the overhead of re-opening a socket. However,
* if the socket has been timed out on the remote end, our attempt to reuse
* the socket may end with an IOException. For that reason, we limit our
* attempts at socket reuse to dfs.client.cached.conn.retry times. After
* that, we create new sockets. This avoids the problem where a thread tries
* to talk to a peer that it hasn't talked to in a while, and has to clean out
* every entry in a socket cache full of stale entries.
*
* @return The new BlockReader. We will not return null.
*
* @throws InvalidToken
* If the block token was invalid.
* InvalidEncryptionKeyException
* If the encryption key was invalid.
* Other IOException
* If there was another problem.
*/
public BlockReader build() throws IOException {
BlockReader reader = null;
Preconditions.checkNotNull(configuration);
//如果允许短路读操作
if (conf.shortCircuitLocalReads && allowShortCircuitLocalReads) {
//判断是否支持老版本(HDFS-2246)的短路读,这种情况是通过RPC从datanode上获取文件路径,然后客户端直接通过该文件路径读取数据,不过由于这种方式可以浏览文件所有数据,所以是不太安全的。
if (clientContext.getUseLegacyBlockReaderLocal()) {
//获取BlockReaderLocalLegacy类对象
reader = getLegacyBlockReaderLocal();
if (reader != null) {
if (LOG.isTraceEnabled()) {
LOG.trace(this + ": returning new legacy block reader local.");
}
return reader;
}
} else {//如果不支持老版本的短路读,那么就进行新版(HDFS-347)的短路读
reader = getBlockReaderLocal();
if (reader != null) {
if (LOG.isTraceEnabled()) {
LOG.trace(this + ": returning new block reader local.");
}
return reader;
}
}
}
if (conf.domainSocketDataTraffic) {
reader = getRemoteBlockReaderFromDomain();
if (reader != null) {
if (LOG.isTraceEnabled()) {
LOG.trace(this + ": returning new remote block reader using " +
"UNIX domain socket on " + pathInfo.getPath());
}
return reader;
}
}
Preconditions.checkState(!DFSInputStream.tcpReadsDisabledForTesting,
"TCP reads were disabled for testing, but we failed to " +
"do a non-TCP read.");
return getRemoteBlockReaderFromTcp();
}
这个build函数会创建一个BlockReader接口实现类对象,相应的实现结构如下:
其中
1、getLegacyBlockReaderLocal函数返回的是BlockReaderLocalLegacy类对象;
2、getBlockReaderLocal函数返回的是BlockReaderLocal类对象;
3、getRemoteBlockReaderFromDomain函数和getRemoteBlockReaderFromTcp函数返回的是RemoteBlockReader2类对象;
其中第一个和第二个获取到的对象是针对短路读的,只不过第一个是比较老的版本,现在已经被废弃了。所谓的短路读就是客户端和datanode在同一台服务器上,此时读取数据就没必要走网络,而是直接进行数据读取操作。关于这个:
老版本(HDFS-2246)的做法是
通过RPC获取datanode上的数据文件和对应的校验文件绝对路径,然后客户端通过两个文件的绝对路径直接进行文件的读取操作。
新版本(HDFS-347)的做法是
通过UNIX Domain Socket进程间通信方式,它使得同一台机器上的两个进程能以Socket的方式通信,并且还可以在进程间传递文件描述符。通过domain socket从datanode进程中将数据文件和对应的校验文件描述符传递到客户端进程中,然后客户端就可以通过文件描述符进行相应的数据读取操作了。
老版本的缺点是由于把文件的绝对路径提供给了客户端,这样就允许外界对datanode上的文件进行写操作,存在安全性问题,而新版本就不存在这个问题,通过domain socket可以传输只读的文件描述符给客户端,这样就可以禁止外界对文件的修改操作。
第三个分别是通过domain socket和tcp来读取数据文件
最终的读取文件操作都落到了BlockReaderLocalLegacy、BlockReaderLocal、RemoteBlockReader2实现的read函数上。这些read函数我们后面会分别进行分析。