[1003.1(2008)/Issue 7 0000411]: adding atomic FD_CLOEXEC support
Austin Group Bug Tracker (noreply-Us27lTmNgZAcWVvVuXF20w@xxxxx.org)
2011-08-04 00:54
A NOTE has been added to this issue.
======================================================================
http://austingroupbugs.net/view.php?id=411
======================================================================
Reported By: eblake
Assigned To: ajosey
======================================================================
Project: 1003.1(2008)/Issue 7
Issue ID: 411
Category: System Interfaces
Type: Enhancement Request
Severity: Objection
Priority: normal
Status: Under Review
Name: Eric Blake
Organization: Red Hat
User Reference: ebb.cloexec
Section: various - see desired action
Page Number: various - see desired action
Line Number: various - see desired action
Interp Status: ---
Final Accepted Text:
======================================================================
Date Submitted: 2011-04-20 17:01 UTC
Last Modified: 2011-08-03 21:38 UTC
======================================================================
Summary: adding atomic FD_CLOEXEC support
======================================================================
Relationships ID Summary
----------------------------------------------------------------------
related to 0000149 Add fdwalk system interface
related to 0000368 Hidden file descriptors should be requi...
related to 0000456 mandate binary mode of fmemopen
======================================================================
----------------------------------------------------------------------
(0000917) eblake (manager) - 2011-08-03 21:38
http://austingroupbugs.net/view.php?id=411#c917
----------------------------------------------------------------------
Replace the original "Desired Action" for popen (affecting lines 46095
through 46167) with:
At lines 45748-45763 [XSH pclose RATIONALE], replace the pclose( )
sample implementation with:
See the RATIONALE for popen( ) for a sample implementation of
pclose( ).
At line 46093 [XSH popen DESCRIPTION], change:
The popen( ) function shall ensure that any streams from previous
popen( ) calls that remain open in the parent process are closed in
the new child process.
to:
The popen( ) function shall ensure that any streams from previous
popen( ) calls that remain open in the parent process are closed in
the new child process, regardless of the FD_CLOEXEC status of the file
descriptor underlying those streams.
At lines 46095 and 46099 [XSH popen DESCRIPTION], change "If mode is"
to "If mode starts with".
After line 46102 [XSH popen DESCRIPTION], add the following:
3. If mode includes a second character of e, then the file descriptor
underlying the stream returned to the calling process by popen( )
shall have the FD_CLOEXEC flag atomically set. Additionally, if the
implementation creates the file descriptor for use by the child
process from within the parent process, then that file descriptor
shall have the FD_CLOEXEC flag atomically set within the parent
process. If the second character is not e, the FD_CLOEXEC flag of the
underlying file descriptor returned by popen( ) shall be clear.
At line 46103 [XSH popen DESCRIPTION], change "3." to "4.".
At line 46148 [XSH popen APPLICATION USAGE], change "r and w" to "r,
w, re, and we".
After line 46167 [XSH popen RATIONALE], add the following:
The e mode modifier to popen( ) is necessary to avoid a data race in
multi-threaded applications. Without it, the parent's file descriptor
is leaked into a second child process created by one thread in the
window between another thread creating the pipe via popen( ) then
using fileno( ) and fcntl( ) on the result. Also, if the popen( )
implementation temporarily has the child's file descriptor open within
the parent, then that file descriptor could also be leaked if it is
not atomically FD_CLOEXEC for the duration in which it is open in the
parent.
The standard only requires that the implementation atomically set
FD_CLOEXEC on file descriptors created in the parent process when the
e mode modifier is in effect; implementations may also do so when the
e modifier is not in use, provided that the FD_CLOEXEC bit is
eventually cleared before popen( ) completes, however, this is not
required because any application worried about the potential file
descriptor leak will already be using the e modifier.
Although the standard is clear that a conforming application should
not call popen( ) when file descriptor 0 or 1 is closed,
implementations are encouraged to handle these cases correctly.
The following two examples demonstrate possible implementations of
popen( ) using other standard functions. These examples are designed
to show FD_CLOEXEC handling rather than all aspects of thread safety,
and implementations are encouraged to improve the locking mechanism
around the state list to be more efficient. Also, remember that other
implementations are possible, including an implementation that uses an
implementation-specific means of creating a pipe between parent and
child where the parent process never has access to the child's end of
the pipe. Both of these examples make use of the following helper
functions, documented but not implemented here, to do the bookkeeping
necessary to properly close all file descriptors created by other
popen( ) calls regardless of their FD_CLOEXEC status:
/* Obtain mutual exclusion lock, so that no concurrent popen( ) or
pclose( ) calls are simultaneously modifying the list of tracked
children. */
static void popen_lock(void);
/* Release mutual exclusion lock, without changing errno. */
static void popen_unlock(void);
/* Add the pid and stream pair to the list of tracked children, prior
to any code that can clear FD_CLOEXEC on the file descriptor
associated with stream. To be used while holding the lock. */
static void popen_add_pair(FILE *stream, pid_t pid);
/* Given a stream, return the associated pid, or -1 with errno set if
the stream was not created by popen( ). To be used while holding
the lock. */
static pid_t popen_get_pid(FILE *stream);
/* Remove stream and its corresponding pid from the list of tracked
children. To be used while holding the lock. */
static void popen_remove(FILE *stream);
/* If stream is NULL, return the first tracked child; otherwise,
return the next tracked child. Return NULL if all tracked children
have been returned. To be used while holding the lock. */
static FILE *popen_next(FILE *stream);
The first example is based on fork( ):
#include <stdio.h>
#include <errno.h>
#include <fcntl.h>
#include <sys/wait.h>
#include <unistd.h>
FILE *popen(const char *command, const char *mode)
{
int fds[2];
pid_t pid;
FILE *stream;
int target = mode[0] == 'w'; /* index of fds used by parent */
/* Validate mode */
if ((mode[0] != 'w' && mode[0] != 'r') ||
mode[1 + (mode[1] == 'e')]) {
errno = EINVAL;
return NULL;
}
/* Create pipe and stream with FD_CLOEXEC set */
if (pipe2(fds, O_CLOEXEC) < 0)
return NULL;
stream = fdopen(fds[target], mode);
if (!stream) {
int saved = errno;
close(fds[0]);
close(fds[1]);
errno = saved;
return NULL;
}
/* Create child process */
popen_lock();
pid = fork();
if (pid < 0) {
int saved = errno;
close(fds[!target]);
fclose(stream);
popen_unlock();
errno = saved;
return NULL;
}
/* Child process. */
if (!pid) {
FILE *tracked = popen_next(NULL);
while (tracked) {
int fd = fileno(tracked);
if (fd < 0 || close(fd))
_exit(127);
tracked = popen_next(tracked);
}
target = mode[0] == 'r'; /* Opposite fd in the child */
/* Use dup2 or fcntl to clear FD_CLOEXEC on child's descriptor,
FD_CLOEXEC will take care of the rest of fds[]. */
if (fds[target] != target) {
if (dup2(fds[target], target) != target)
_exit(127);
} else {
int flags = fcntl(fds[target], F_GETFD);
if (flags < 0 ||
fcntl(fds[target], F_SETFD, flags | FD_CLOEXEC) < 0)
_exit(127);
}
execl("/bin/sh", "sh", "-c", command, NULL);
_exit(127);
}
/* Parent process. From here on out, the close and fcntl system
calls are assumed to pass, since all inputs are valid and do not
require allocating any fds or memory. Besides, excluding
failures due to undefined behavior (such as another thread
closing an fd it knows nothing about), cleanup from any defined
failures would require stopping and reaping the child process,
which may have worse consequences. */
close(fds[!target]);
popen_add_pair(stream, pid);
popen_unlock();
if (mode[1] != 'e') {
int flags = fcntl(fds[target], F_GETFD);
if (flags >= 0)
fcntl(fds[target], F_SETFD, flags & ~FD_CLOEXEC);
}
return stream;
}
The second example is based on posix_spawn( ):
#include <stdio.h>
#include <errno.h>
#include <fcntl.h>
#include <sys/wait.h>
#include <unistd.h>
#include <spawn.h>
extern char **environ;
FILE *popen(const char *command, const char *mode)
{
int fds[2];
pid_t pid;
FILE *stream;
int target = mode[0] == 'w'; /* index of fds used by parent */
const char *argv[] = { "sh", "-c", command, NULL };
posix_spawn_file_actions_t actions;
int saved;
FILE *tracked;
/* Validate mode */
if ((mode[0] != 'w' && mode[0] != 'r') ||
mode[1 + (mode[1] == 'e')]) {
errno = EINVAL;
return NULL;
}
/* Create pipe and stream with FD_CLOEXEC set */
if (pipe2(fds, O_CLOEXEC) < 0)
return NULL;
stream = fdopen(fds[target], mode);
if (!stream) {
saved = errno;
close(fds[0]);
close(fds[1]);
errno = saved;
return NULL;
}
/* Create child process */
if (posix_spawn_file_actions_init(&actions)) {
saved = errno;
goto spawnerr1;
}
popen_lock();
tracked = popen_next(NULL);
while (tracked) {
int fd = fileno(tracked);
if (fd < 0 || posix_spawn_file_actions_addclose(&actions, fd))
goto spawnerr2;
tracked = popen_next(tracked);
}
if (posix_spawn_file_actions_adddup2(&actions, fds[!target], !target))
goto spawnerr2;
if (posix_spawn(&pid, "/bin/sh", &actions, NULL, (char **)argv, environ))
{
spawnerr2:
saved = errno;
posix_spawn_file_actions_destroy(&actions);
popen_unlock();
spawnerr1:
close(fds[!target]);
fclose(stream);
errno = saved;
return NULL;
}
/* From here on out, system calls are assumed to pass, since all
inputs are valid and do not require allocating any fds or memory.
Besides, excluding failures due to undefined behavior (such as
another thread closing an fd it knows nothing about), cleanup
from any defined failures would require stopping and reaping the
child process, which may have worse consequences. */
posix_spawn_file_actions_destroy(&actions);
close(fds[!target]);
popen_add_pair(stream, pid);
popen_unlock();
if (mode[1] != 'e') {
int flags = fcntl(fds[target], F_GETFD);
if (flags >= 0)
fcntl(fds[target], F_SETFD, flags & ~FD_CLOEXEC);
}
return stream;
}
Both examples can share a common pclose( ) implementation.
int pclose(FILE *stream)
{
int status;
popen_lock();
pid_t pid = popen_get_pid(stream);
if (pid < 0) {
popen_unlock();
return -1;
}
popen_remove(stream);
popen_unlock();
fclose(stream); /* Ignore failure */
while (waitpid(pid, &status, 0) == -1) {
if (errno != EINTR) {
status = -1;
break;
}
}
return status;
}
Note that, while a particular implementation of popen( ) (such as the
two above) can assume a particular path for the shell, such a path is
not necessarily valid on another system. The above examples are not
portable, and are not intended to be.
Issue History
Date Modified Username Field Change
======================================================================
2011-04-20 17:01 eblake New Issue
2011-04-20 17:01 eblake Status New => Under Review
2011-04-20 17:01 eblake Assigned To => ajosey
2011-04-20 17:01 eblake Name => Eric Blake
2011-04-20 17:01 eblake Organization => Red Hat
2011-04-20 17:01 eblake User Reference => ebb.cloexec
2011-04-20 17:01 eblake Section => various - see
desired action
2011-04-20 17:01 eblake Page Number => various - see
desired action
2011-04-20 17:01 eblake Line Number => various - see
desired action
2011-04-20 17:01 eblake Interp Status => ---
2011-04-20 17:02 eblake Tag Attached: issue8
2011-04-20 17:03 eblake Relationship added related to 0000149
2011-04-20 17:03 eblake Relationship added related to 0000368
2011-04-28 16:21 eblake Description Updated
2011-04-28 16:21 eblake Desired Action Updated
2011-04-28 16:23 eblake Note Added: 0000773
2011-04-28 18:41 eblake Note Added: 0000774
2011-05-03 11:04 drepper Note Added: 0000776
2011-06-02 19:13 eblake Relationship added related to 0000456
2011-08-03 21:38 eblake Note Added: 0000917
======================================================================
> /* Use dup2 or fcntl to clear FD_CLOEXEC on child's descriptor,
> FD_CLOEXEC will take care of the rest of fds[]. */
> if (fds[target] != target) {
> if (dup2(fds[target], target) != target)
> _exit(127);
> } else {
> int flags = fcntl(fds[target], F_GETFD);
> if (flags < 0 ||
> fcntl(fds[target], F_SETFD, flags | FD_CLOEXEC) < 0)
This is supposed to be clearing the flag, so should be:
fcntl(fds[target], F_SETFD, flags & ~FD_CLOEXEC) < 0)
--
Geoff Clare <g.clare-7882/jkIBncuagvECLh61g@xxxxx.org>
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Thanks for catching that. I edited note 917 in place with that fix.
--
Eric Blake eblake-H+wXaHxf7aLQT0dZR+AlfA@xxxxx.org +1-801-349-2682
Libvirt virtualization library http://libvirt.org