How Improved Short-Circuit Local Reads Bring Better Performance and Security to Hadoop

One of the key principles behind Apache Hadoop is the idea that moving computation is cheaper than moving data — we prefer to move the computation to the data whenever possible, rather than the other way around. Because of this, the Hadoop Distributed File System (HDFS) typically handles many “local reads” reads where the reader is on the same node as the data:

Initially, local reads in HDFS were handled the same way as remote reads: the client connected to the DataNode via a TCP socket and transferred the data via DataTransferProtocol. This approach was simple, but it had some downsides. For example, the DataNode had to keep a thread around and a TCP socket for each client that was reading a block. There was the overhead of the TCP protocol in the kernel, as well as the overhead of DataTransferProtocol itself. There was room to optimize.

In this post, you’ll learn about an important new optimization for HDFS called secure short-circuit local reads, the benefits of its implementation, and how it can speed up your applications.

Short-Circuit Local Reads with HDFS-2246

In HDFS-2246, Andrew Purtell, Suresh Srinivas, Jitendra Nath Pandey, and Benoy Antony added an optimization called “short-circuit local reads”.

The key idea behind short-circuit local reads is this: because the client and the data are on the same node, there is no need for the DataNode to be in the data path. Rather, the client itself can simply read the data from the local disk. This performance optimization made it into CDH, Cloudera’s distribution of Hadoop and related projects, in CDH3u3.

The implementation of short-circuit local reads found in HDFS-2246, although a good start, came with a number of configuration headaches. System administrators had to change the permissions on the DataNode’s data directories to allow the clients to open the relevant files.  They had to specifically whitelist the users who were able to use short-circuit local reads — no other users would be allowed. Typically, those users also had to be placed in a special UNIX group.

The main problem with HDFS-2246 was that it opened up the DataNode’s data directories to the client.

Unfortunately, those permission changes opened up a security hole: Users with the permissions necessary to read the DataNode’s files could simply browse through everything, not just things that they were supposed to have access to. This was a little bit like making the user a super-user!  This might be acceptable for a few users — such as the “hbase” user — but in general, it presented problems.  So although a few dedicated administrators enabled short-circuit local reads, it was not a common choice.

HDFS-347: Making Short-Circuit Local Reads Secure

The main problem with HDFS-2246 was that it opened up all of the DataNode’s data directories to the client. Instead, what we really want is to share only a few carefully chosen files.

Luckily, UNIX has a mechanism for doing just that, called “file descriptor passing.” HDFS-347 uses this mechanism to implement secure short-circuit local reads. Instead of passing the directory name to the client, the DataNode opens the block file and metadata file and passes them directly to the client. Because the file descriptors are read-only, the client cannot modify the files it was passed. And because it has no access to the block directories itself, it cannot read anything to which it is not supposed to have access.

Windows has a similar mechanism for passing file descriptors between processes.  Although Cloudera doesn’t support this yet in Hadoop, in the meantime, Windows users can use the legacy block reader by setting dfs.client.use.legacy.blockreader.local to true.

Caching File Descriptors

HDFS clients often read the same block file many times. (This is particularly true for HBase.) To speed up this case, the old short-circuit local read implementation, HDFS-2246, had a block path cache. This cache allowed the client to reopen a block file that it had already read recently without asking the DataNode for its path.

Instead of a path cache, the new-style short-circuit implementation includes a file descriptor cache namedFileInputStreamCache. This is better than a path cache, since it doesn’t require the client to re-open the file to re-read the block. We found that this approach improved performance over the old short-circuit local read implementation.

The size of the cache can be tuned with dfs.client.read.shortcircuit.streams.cache.size, whereas cache timeout is controlled by dfs.client.read.shortcircuit.streams.cache.expiry.ms. The cache can also be turned off by setting its size to 0. Most of the time, the defaults are a good choice. However, if you have an unusually large working set and a high file descriptor limit, you could try increasing it.

HDFS-347 Configuration

With the new-style short-circuit local reads introduced in HDFS-347, any HDFS user can make use of short-circuit reads, not just specifically configured ones. There is also no need to modify which UNIX group the users are in or change the ownership of the DataNode directories. However, because the Java standard library does not include facilities for file descriptor passing, HDFS-347 requires a JNI component in order to function. You will need to have libhadoop.so installed to use it.

In testing, HDFS-347 was the fastest implementation in all cases.

HDFS-347 also requires a UNIX domain socket path to be configured via dfs.domain.socket.path. This path must be secure to prevent unprivileged processes from performing a man-in-the-middle attack. Every path component of the socket path must be owned either by root or by the user who started the DataNode; world-writable or group-writable paths cannot be used.

Luckily, if you install a Cloudera parcel, RPM, or deb, it will create a secure UNIX domain socket path for you in the default location. It will also install libhadoop.so in the right place.

For more information about configuring short-circuit local reads, see the upstream documentation.

Performance

So, how fast is this new implementation? I used a program called hio_bench to get some performance statistics. The code for hio_bench is available at https://github.com/cmccabe/hio_test.

These tests were run on an 8-core Intel Xeon 2.13 with 12 hard drives. I used CDH 4.3.1 with an underlying filesystem of ext4. Each number is the average of three runs. Error bars are provided.

HDFS-347 is the fastest implementation in all cases, probably due to the FileInputStreamCache. In contrast, the HDFS-2246 implementation ends up re-opening the ext4 block file many times, and open is an expensive operation.

The short-circuit implementations have a bigger relative advantage in the random read test than in the sequential read test. This is partly because readahead has not been implemented for short-circuit local reads yet. (See HDFS-4697 for a discussion.) 

Conclusion

Short-circuit local reads are a great example of an optimization enabled by Hadoop’s model of bringing the computation to the data. They’re also a good example of how, having tackled the challenges of scaling head-on, Cloudera is now tackling the challenges of getting more performance out of each node in the cluster.

If you are using CDH 4.2 or later, give the new implementation a try!

Colin McCabe is a Software Engineer on the Platform team, and a Hadoop Committer.

Ref: http://blog.cloudera.com/blog/2013/08/how-improved-short-circuit-local-reads-bring-better-performance-and-security-to-hadoop/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
目标检测(Object Detection)是计算机视觉领域的一个核心问题,其主要任务是找出图像中所有感兴趣的目标(物体),并确定它们的类别和位置。以下是对目标检测的详细阐述: 一、基本概念 目标检测的任务是解决“在哪里?是什么?”的问题,即定位出图像中目标的位置并识别出目标的类别。由于各类物体具有不同的外观、形状和姿态,加上成像时光照、遮挡等因素的干扰,目标检测一直是计算机视觉领域最具挑战性的任务之一。 二、核心问题 目标检测涉及以下几个核心问题: 分类问题:判断图像中的目标属于哪个类别。 定位问题:确定目标在图像中的具体位置。 大小问题:目标可能具有不同的大小。 形状问题:目标可能具有不同的形状。 三、算法分类 基于深度学习的目标检测算法主要分为两大类: Two-stage算法:先进行区域生成(Region Proposal),生成有可能包含待检物体的预选框(Region Proposal),再通过卷积神经网络进行样本分类。常见的Two-stage算法包括R-CNN、Fast R-CNN、Faster R-CNN等。 One-stage算法:不用生成区域提议,直接在网络中提取特征来预测物体分类和位置。常见的One-stage算法包括YOLO系列(YOLOv1、YOLOv2、YOLOv3、YOLOv4、YOLOv5等)、SSD和RetinaNet等。 四、算法原理 以YOLO系列为例,YOLO将目标检测视为回归问题,将输入图像一次性划分为多个区域,直接在输出层预测边界框和类别概率。YOLO采用卷积网络来提取特征,使用全连接层来得到预测值。其网络结构通常包含多个卷积层和全连接层,通过卷积层提取图像特征,通过全连接层输出预测结果。 五、应用领域 目标检测技术已经广泛应用于各个领域,为人们的生活带来了极大的便利。以下是一些主要的应用领域: 安全监控:在商场、银行
Profiling is the process of analyzing the performance of a program or function in order to identify bottlenecks or areas for optimization. cProfile is a built-in Python module that allows you to profile your code and generate a report of the performance metrics. To optimize the performance of a slow-running function using cProfile, you can follow these steps: 1. Import the cProfile module at the top of your Python file: ``` import cProfile ``` 2. Define the function that you want to profile: ``` def my_function(): # code goes here ``` 3. Run the function with cProfile: ``` cProfile.run('my_function()') ``` This will generate a report of the performance metrics for your function. 4. Analyze the report to identify bottlenecks or areas for optimization. The cProfile report will show you the number of times each function was called, the total time spent in each function, and the amount of time spent in each function call. Look for functions that are called frequently or that take a long time to execute. 5. Make changes to optimize the function. Once you have identified the bottlenecks, you can make changes to your code to optimize the function. This may involve simplifying the code, reducing the number of function calls, or using more efficient algorithms or data structures. 6. Repeat the profiling process to measure the impact of your changes. After making changes to your code, run the function again with cProfile to see if the performance has improved. If not, you may need to make additional changes or try a different approach. By using cProfile to profile your code and identify bottlenecks, you can optimize the performance of slow-running functions and improve the overall efficiency of your Python programs.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值