比ls快8倍？百万级文件遍历的奇技淫巧

最新推荐文章于 2024-05-27 19:06:01 发布

gt9000

最新推荐文章于 2024-05-27 19:06:01 发布

阅读量2k

点赞数

1.问题背景
在Linux下当我们操作一个文件数较少的目录时，例如执行ls列出当前目录下所有的文件，这个命令可能会瞬间执行完毕，但是当一个目录下有上百万个文件时，执行ls命令会发生什么呢，带着疑问，我们做了如下实验(实验中使用的存储设备为NVMe接口的SSD)：

[root@localhost /data1/test_ls]# for i in {1…1000000}; do echo ‘AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA’ > $i.txt ; done
[root@localhost /data1/test_ls]# time ls -l | wc -l
1000001

real 0m5.802s
user 0m2.544s
sys 0m3.328s
可以看到，统计一个包含1000000个小文件的目录下的文件个数花费了将近6秒的时间，那么文件个数多造成ls缓慢的原因是什么呢，且听我们详细分析。

2.原理分析
众所周知，strace是分析系统调用的利器，所以我们用strace来分析在大目录下执行ls命令的结果，其中这样的输出引起了我们的注意：

…
getdents(3, /* 1024 entries /, 32768) = 32768
getdents(3, / 1024 entries /, 32768) = 32768
getdents(3, / 1024 entries /, 32768) = 32768
getdents(3, / 1024 entries /, 32768) = 32768
brk(0) = 0x12e8000
brk(0x1309000) = 0x1309000
getdents(3, / 1024 entries /, 32768) = 32768
mremap(0x7f93b6246000, 2461696, 4919296, MREMAP_MAYMOVE) = 0x7f93b5d95000
getdents(3, / 1024 entries /, 32768) = 32768
getdents(3, / 1024 entries /, 32768) = 32768
getdents(3, / 1024 entries */, 32768) = 32768
brk(0) = 0x1309000
brk(0x132a000) = 0x132a000
…
可以看到，在大目录下执行ls命令会频繁调用getdents这一系统调用，实际上我们通过查看coreutils的ls.c源码可以发现：

static void
print_dir (const char *name, const char *realname)
{
register DIR *dirp;
register struct dirent *next;
register uintmax_t total_blocks = 0;
static int first = 1;

errno = 0;
dirp = opendir (name);
…
while (1)
{
/* Set errno to zero so we can distinguish between a readdir failure
and when readdir simply finds that there are no more entries. /
errno = 0;
if ((next = readdir (dirp)) == NULL)
{
if (errno)
{
/ Save/restore errno across closedir call. */
int e = errno;
closedir (dirp);
errno = e;

       /* Arrange to give a diagnostic after exiting this loop.  */
       dirp = NULL;
     }    
   break;
 }

…
ls会首先调用opendir打开一个目录，然后循环调用readdir这个glibc中的函数直到遇到目录流的结尾，也即读完所有的目录项(dentry)为止。我们首先看一下man page里面对于readdir的定义：

struct dirent *readdir(DIR *dirp);
readdir返回一个指向dirent结构体的指针，指向目录流dirp中的下一个目录项，所以在print_dir的循环中，每次从目录流中取出一个目录项并赋值给next变量。既然说到目录流(directory stream)，我们顺便看一下glibc中对它的定义：

#define __dirstream DIR

struct __dirstream
{
int fd; /* File descriptor. */

__libc_lock_define (, lock) /* Mutex lock for this structure.  */

size_t allocation;        /* Space allocated for the block.  */
size_t size;        /* Total valid data in the block.  */
size_t offset;        /* Current offset into the block.  */

off_t filepos;        /* Position of next entry to read.  */

/* Directory block.  */
char data[0] __attribute__ ((aligned (__alignof__ (void*))));

};
从上面的定义中可以看到，目录流实则维护一个buffer，这个buffer的大小由allocation来确定，那么问题来了，allocation值什么时候确定，其实是在opendir过程中确定下来的。opendir的调用路径如下所示：

__opendir–>__opendirat–>__alloc_dir
在__alloc_dir中，

DIR *
internal_function
__alloc_dir (int fd, bool close_fd, int flags, const struct stat64 *statp)
{
…
const size_t default_allocation = (4 * BUFSIZ < sizeof (struct dirent64)
? sizeof (struct dirent64) : 4 * BUFSIZ);
size_t allocation = default_allocation;
…
DIR *dirp = (DIR *) malloc (sizeof (DIR) + allocation);
…

dirp->fd = fd;
...
dirp->allocation = allocation;
dirp->size = 0;
dirp->offset = 0;
dirp->filepos = 0;

return dirp;

}
会分配sizeof(DIR) + allocation大小的内存空间，最后将allocation赋值给目录流dirp的allocation变量。allocation的默认值通过比较4*BUFSIZ的大小和dirent64结构体的大小(<32768)来确定，BUFSIZ的大小在以下几个头文件中定义：

stdio.h: #define BUFSIZ _IO_BUFSIZ
libio.h: #define _IO_BUFSIZ _G_BUFSIZ
_G_config.h: #define _G_BUFSIZ 8192
回看一下strace中的输出，getdents第三个参数以及返回值32768就是这么来的。
讲完目录流的buffer大小是怎么确定的之后，让我们回到readdir的glibc实现。

DIRENT_TYPE *
__READDIR (DIR *dirp)
{
DIRENT_TYPE dp;
…
do
{
size_t reclen;
if (dirp->offset >= dirp->size)
{
/ We’ve emptied out our buffer. Refill it. /
size_t maxread;
ssize_t bytes;
#ifndef _DIRENT_HAVE_D_RECLEN
/ Fixed-size struct; must read one at a time (see below). */
maxread = sizeof *dp;
#else
maxread = dirp->allocation;
#endif
bytes = __GETDENTS (dirp->fd, dirp->data, maxread);
…
dirp->size = (size_t) bytes;

        /* Reset the offset into the buffer.  */
        dirp->offset = 0;
    }

    dp = (DIRENT_TYPE *) &dirp->data[dirp->offset];

#ifdef _DIRENT_HAVE_D_RECLEN
reclen = dp->d_reclen;
#else
assert (sizeof dp->d_name > 1);
reclen = sizeof *dp;
dp->d_name[sizeof dp->d_name] = ‘\0’;
#endif
dirp->offset += reclen;

#ifdef _DIRENT_HAVE_D_OFF
dirp->filepos = dp->d_off;
#else
dirp->filepos += reclen;
#endif

  /* Skip deleted files.  */
} while (dp->d_ino == 0);
...
return dp;

}
这段代码的逻辑还是比较清晰的，首先判断目录流的偏移量有没有超过buffer的大小，如果超过，则说明已经读完缓冲区中的所有内容，需要重新调用getdents读取，getdents一次最多读取32768个字节(有_DIRENT_HAVE_D_RECLEN定义时为dirp->allocation)，并将读取到的buffer返回给dirp->data，读取到的字节数返回给dirp->size，然后重置偏移量为0。如果没有超过buffer大小，则从dirp->offset开始读，然后将偏移量增加reclen个字节作为下次读取的起点，reclen记录在目录项结构体dirent的d_reclen变量中，表示当前目录项的长度，dirent(DIRENT_TYPE)这个结构体的定义如下所示:

struct dirent
{
__ino_t d_ino; /* inode number /
__off_t d_off; / offset to the next dirent /
unsigned short int d_reclen; / length of this record /
unsigned char d_type; / type of file /
char d_name[256]; / filename */
};
总结一下以上整个过程就是，ls命令每次调用readdir都会从目录流中读取一个目录项，如果目录流的buffer读完，就会重新调用getdents填充这一buffer，下次从新buffer的开头开始读，buffer的默认大小为32K，这也就意味着如果一个目录下有大量的目录项(目录项的总大小可以通过ls -dl查看)，则执行ls命令时将会频繁地调用getdents，导致目录下的文件数越多时ls的执行时间越长。

3.解决方法
既然glibc中readdir的buffer大小我们没法控制，何不绕过readdir直接调用getdents，在这个系统调用中我们可以直接控制buffer的大小，以下就是一个简单的例子listdir.c：

#define GNU_SOURCE
#include <dirent.h> /* Defines DT* constants */
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <sys/syscall.h>

#define handle_error(msg)
do { perror(msg); exit(EXIT_FAILURE); } while (0)

struct linux_dirent {
long d_ino;
off_t d_off;
unsigned short d_reclen;
char d_name[];
};

#define BUF_SIZE 102410245

int
main(int argc, char *argv[])
{
int fd, nread;
char buf[BUF_SIZE];
struct linux_dirent *d;
int bpos;
char d_type;

fd = open(argc > 1 ? argv[1] : ".", O_RDONLY | O_DIRECTORY);
if (fd == -1)
    handle_error("open");

for ( ; ; ) {
    nread = syscall(SYS_getdents, fd, buf, BUF_SIZE);
    if (nread == -1)
        handle_error("getdents");

    if (nread == 0)
        break;

    printf("--------------- nread=%d ---------------\n", nread);
    printf("inode#    file type  d_reclen  d_off   d_name\n");
    for (bpos = 0; bpos < nread;) {
        d = (struct linux_dirent *) (buf + bpos);
        printf("%8ld  ", d->d_ino);
        d_type = *(buf + bpos + d->d_reclen - 1);
        printf("%-10s ", (d_type == DT_REG) ?  "regular" :
                         (d_type == DT_DIR) ?  "directory" :
                         (d_type == DT_FIFO) ? "FIFO" :
                         (d_type == DT_SOCK) ? "socket" :
                         (d_type == DT_LNK) ?  "symlink" :
                         (d_type == DT_BLK) ?  "block dev" :
                         (d_type == DT_CHR) ?  "char dev" : "???");
        printf("%4d %10lld  %s\n", d->d_reclen,
                (long long) d->d_off, d->d_name);
        bpos += d->d_reclen;
    }
}

exit(EXIT_SUCCESS);

}
在这段代码中，我们将getdents的buffer大小设置为5M，编译执行这段代码，我们得到如下结果：

[root@localhost /data1]# time ./listdir test_rm | wc -l
1000016

real 0m0.755s
user 0m0.432s
sys 0m0.320s
统计目录中的文件数由默认的5.802s缩短为0.755s，可以看到提升还是较为明显的。

总结
其实不止是ls命令，其他一些命令如rm -r等的实现中都会用到glibc中的readdir函数，所以如果遇到操作百万级文件的大目录这种场景(当然实践中不提倡一个目录下放这么多文件)，不妨直接调用getdents并加上自己的一些逻辑，这样就可以在实现标准命令功能的基础上，还能获得其不具备的性能提升。