背景
在linux系统调用close()函数关闭一个文件描述符可能失败,细心谨慎的程序员可能会检测返回值,如果返回失败再做重试,但这就可能导致了double-close问题。
查看close()函数的man手册:
Retrying the close() after a failure return is the wrong thing to do, since this may
cause a reused file descriptor from another thread to be closed. This can occur because
the Linux kernel always releases the file descriptor early in the close operation, freeing
it for reuse; the steps that may return an error, such as flushing data to the filesystem
or device, occur only later in the close operation.
close()重试是错误的选择,即便close()返回了失败,实际上该文件也已经关闭了(其它系统的行为不一定如此),系统回收了该文件描述符,此时其它线程可能使用了该文件描述符值(实际是整型数值),如果再执行关闭操作可能将其它线程打开的文件描述符关闭了
实践
我们来试试再Android中进行double-close操作会怎样?
//线程1
void* threadFunc1(void* p)
{
int fd = open("/data/data/com.test.test_double_close/cache/test1.tmp", O_CREAT|O_RDWR, S_IRWXU);
if (fd == -1)
{
print_log("thread 1 open file failed %d", errno);
return NULL;
}
print_log("thread 1 open file %d", fd);
close(fd);
usleep(100*1000);
close(fd);
return NULL;
}
//线程2
void* threadFunc2(void* p)
{
usleep(50*1000);
int fd = open("/data/data/com.test.test_double_close/cache/test2.tmp", O_CREAT|O_RDWR, S_IRWXU);
if (fd == -1)
{
print_log("thread 2 open file failed %d", errno);
return NULL;
}
print_log("thread 2 open file %d", fd);
usleep(100*1000);
int ret = write(fd, "123", 3);
if (ret == -1)
{
print_log("thread 2 write file failed %d", errno);
}
print_log("thread 2 write file len %d", ret);
close(fd);
return NULL;
}
void testDoubleClose()
{
pthread_t hThread1;
pthread_t hThread2;
if (pthread_create(&hThread1, NULL, &threadFunc1, NULL) != 0)
{
print_log("create thread 1 failed\n");
return;
}
// if (pthread_create(&hThread2, NULL, &threadFunc2, NULL) != 0)
// {
// print_log("create thread 2 failed\n");
// return;
// }
}
运行程序之后发现只需要跑线程1 app就崩了,线程1在关闭自身打开的文件后100毫秒后再关闭一次,此时可能将app中的其它线程的文件描述符给关了,导致运行异常
Demo程序源码:Learning-Android/double-close at master · ChriFang/Learning-Android (github.com)
防御
为了防御这个问题,Android 10之后的系统引入了fdsan机制,请查看 docs/fdsan.md (googlesource.com)
以后有时间再来实践一下~~