A Primer on Signals in the Solaris OS

560人阅读 评论(0) 收藏 举报

Learn the Ins and Outs of Implementing Signals in the Solaris Operating Environment


Signals are a process event notification mechanism that has been part of the UNIX® system from the earliest days. The APIs and underlying behavioral characteristics of signals have evolved over the years, at times diverging between the BSD and SVR4 releases of UNIX. Fortunately, industry standards brought things together, and you now have a well-understood and consistent foundation for signals.

Rather than work through a tutorial on writing code with signals (W. Richard Stevens's Advanced Programming in the UNIX Environment (see Resources) is an outstanding source for learning to program with signals), this article opts instead to help you build a solid foundation around signals with detailed background and implementation discussions.

Signals are used to notify a process or thread of a particular event. Many engineers compare signals with hardware interrupts, which occur when a hardware subsystem such as a disk I/O interface (an SCSI host adapter, for example) generates an interrupt to a processor as a result of a completed I/O. This event in turn causes the processor to enter an interrupt handler, so subsequent processing can be done in the operating system based on the source and cause of the interrupt.

UNIX® guru W. Richard Stevens, however, aptly describes signals as software interrupts. When a signal is sent to a process or thread, a signal handler may be entered (depending on the current disposition of the signal), which is similar to the system entering an interrupt handler as the result of receiving an interrupt.

There is quite a bit of history related to signals, design changes in the signal code, and various implementations of UNIX. This was due in part to some deficiencies in the early implementation of signals, as well as the parallel development work done on different versions of UNIX, primarily BSD UNIX and AT&T System V. W. Richard Stevens, James Cox, and Berny Goodheart (see Resources) cover these details in their respective books. What does warrant mention is that early implementations of signals were deemed unreliable. The unreliability stemmed from the fact that in the old days the kernel would reset the signal handler to its default if a process caught a signal and invoked its own handler, and the reset occurred before the handler was invoked. Attempts to address this issue in user code by having the signal handler first reinstall itself did not always solve the problem, as successive occurrences of the same signal resulted in race conditions, where the default action was invoked before the user-defined handler was reinstalled. For signals that had a default action of terminating the process, this created severe problems. This problem (and some others) were addressed in 4.3BSD UNIX and SVR3 in the mid-'80s.

The implementation of reliable signals has been in place for many years now, where an installed signal handler remains persistent and is not reset by the kernel. The POSIX standards provided a fairly well-defined set of interfaces for using signals in code, and today the Solaris Operating Environment implementation of signals is fully POSIX-compliant. Note that reliable signals require the use of the newer sigaction(2) interface, as opposed to the traditional signal(3C) call.

The occurrence of a signal may be synchronous or asynchronous to the process or thread, depending on the source of the signal and the underlying reason or cause. Synchronous signals occur as a direct result of the executing instruction stream, where an unrecoverable error (such as an illegal instruction or illegal address reference) requires an immediate termination of the process. Such signals are directed to the thread whose execution stream caused the error. Because an error of this type causes a trap into a kernel trap handler, synchronous signals are sometimes referred to as traps. Asynchronous signals are external to (and in some cases unrelated to) the current execution context. One obvious example is the sending of a signal to a process from another process or thread, via a kill(2), _lwp_kill(2), or sigsend(2) system call, or a thr_kill(3T), pthread_kill(3T), or sigqueue(3R) library invocation. Asynchronous signals are also referred to as interrupts.

Every signal has a unique signal name, an abbreviation that begins with SIG (SIGINT for interrupt signal, for example) and a corresponding signal number. Additionally, for all possible signals, the system defines a default disposition, or action to take when a signal occurs. There are four possible default dispositions:

  • Exit: Forces the process to exit
  • Core: Forces the process to exit, and creates a core file
  • Stop: Stops the process
  • Ignore: Ignores the signal; no action taken

A signal's disposition within a process's context defines what action the system will take on behalf of the process when a signal is delivered. All threads and LWPs (lightweight processes) within a process share the signal disposition, which is processwide and cannot be unique among threads within the same process. The table below provides a complete list of signals, along with a description and default action.

Name Number Default action Description
SIGHUP 1 Exit Hangup (ref termio(7I)).
SIGINT 2 Exit Interrupt (ref termio(7I)).
SIGQUIT 3 Core Quit (ref termio(7I))
SIGILL 4 Core Illegal Instruction
SIGTRAP 5 Core Trace or breakpoint trap
SIGABRT 6 Core Abort
SIGEMT 7 Core Emulation trap
SIGFPE 8 Core Arithmetic exception
SIGKILL 9 Exit Kill
SIGBUS 10 Core Bus error -- actually a misaligned address error
SIGSEGV 11 Core Segmentation fault -- an address reference boundary error
SIGSYS 12 Core Bad system call
SIGPIPE 13 Exit Broken pipe
SIGALRM 14 Exit Alarm clock
SIGTERM 15 Exit Terminated
SIGUSR1 16 Exit User defined signal 1
SIGUSR2 17 Exit User defined signal 2
SIGCHLD 18 Ignore Child process status changed
SIGPWR 19 Ignore Power fail or restart
SIGWINCH 20 Ignore Window size change
SIGURG 21 Ignore Urgent socket condition
SIGPOLL 22 Exit Pollable event (ref streamio(7I))
SIGSTOP 23 Stop Stop (cannot be caught or ignored)
SIGTSTP 24 Stop Stop (job control, e.g., ^z))
SIGCONT 25 Ignore Continued
SIGTTIN 26 Stop Stopped -- tty input (ref termio(7I))
SIGTTOU 27 Stop Stopped -- tty output (ref termio(7I))
SIGVTALRM 28 Exit Virtual timer expired
SIGPROF 29 Exit Profiling timer expired
SIGXCPU 30 Core CPU time limit exceeded (ref getrlimit(2))
SIGXFSZ 31 Core File size limit exceeded (ref getrlimit(2))
SIGWAITING 32 Ignore Concurrency signal used by threads library
SIGLWP 33 Ignore Inter-LWP signal used by threads library
SIGFREEZE 34 Ignore Checkpoint suspend
SIGTHAW 35 Ignore Checkpoint resume
SIGCANCEL 36 Ignore Cancellation signal used by threads library
SIGLOST 37 Ignore Resource lost
SIGRTMIN 38 Exit Highest priority realtime signal
SIGRTMAX 45 Exit Lowest priority realtime signal

Back to Top

Signal description and default action

Note that SIGLOST first appeared in Solaris release 2.6. Solaris 2.5 and 2.5.1 do not define this signal, and instead have SIGRTMIN and SIGRTMAX at signal numbers 37 and 44, respectively. The kernel defines MAXSIG (available for user code in /usr/include/sys/signal.h) as a symbolic constant used in various places in kernel signal support code. MAXSIG is 44 in Solaris 2.5 and 2.5.1, and 45 in Solaris 2.6 and 7.

The disposition of a signal can be changed from its default, and a process can arrange to catch a signal and invoke a signal handling routine of its own, or ignore a signal that may not have a default disposition of Ignore. The only exceptions are SIGKILL and SIGSTOP, whose default dispositions cannot be changed. The interfaces for defining and changing signal disposition are the signal(3C) and sigset(3C) libraries, and the sigaction(2) system call. Signals can also be blocked, which means the process has temporarily prevented delivery of a signal. The generation of a signal that has been blocked will result in the signal remaining pending to the process until it is explicitly unblocked, or the disposition is changed to Ignore. The sigprocmask(2) system call will set or get a process's signal mask, the bit array that is inspected by the kernel to determine if a signal is blocked or not. thr_setsigmask(3T) and pthread_sigmask(3T) are the equivalent interfaces for setting and retrieving the signal mask at the user-threads level.

I mentioned earlier that a signal may originate from several different places, for a variety of different reasons. The first three signals listed in the table above - SIGHUP, SIGINT, and SIGQUIT - are generated by a keyboard entry from the controlling terminal (SIGINT and SIGHUP), or they are generated if the control terminal becomes disconnected (SIGHUP - use of the nohup(1) command makes processes "immune" from hangups by setting the disposition of SIGHUP to Ignore). Other terminal I/O-related signals include SIGSTOP, SIGTTIN, SIGTTOU, and SIGTSTP. For the signals that originate from a keyboard command, the actual key sequence that generates the signals, usually Ctrl-C, is defined within the parameters of the terminal session, typically via stty(1), which results in a SIGINT being sent to a process, and has a default disposition of Exit.

Signals generated as a direct result of an error encountered during instruction execution start with a hardware trap on the system. Different processor architectures define various traps that result in an immediate vectored transfer of control to a kernel trap-handling function. The Solaris kernel builds a trap table and inserts trap-handling routines in the appropriate locations based on the architecture specification of the processors that Solaris supports: SPARC V7 (early Sun-4 architectures), SPARC V8 (SuperSPARC - Sun-4m and Sun-4d architectures), SPARC V9 (UltraSPARC), and x86 (in Intel parlance they're called interrupt descriptor tables or IDTs; on SPARC, they're called trap tables). The kernel-installed trap handler will ultimately generate a signal to the thread that caused the trap. The signals that result from hardware traps are SIGILL, SIGFPE, SIGSEGV, SIGTRAP, SIGBUS, and SIGEMT.

In addition to terminal I/O and error trap conditions, signals can originate from sources such as an explicit send programmatically via kill(2) or thr_kill(3T), or from a shell issuing a kill(1) command. Parent processes are notified of status change in a child process via SIGCHLD. The alarm(2) system call sends a SIGALRM when the timer expires. Applications can create user-defined signals as a somewhat crude form of interprocess communication by defining handlers for SIGUSR1 or SIGUSR2 and then sending those signals between processes. The kernel sends SIGXCPU if a process exceeds its processor time resource limit or SIGXFSZ if a file write exceeds the file size resource limit. A SIGABRT is sent as a result of an invocation of the abort(3C) library. If a process is writing to a pipe and the reader has terminated, SIGPIPE is generated.

These examples of signals generated as a result of events beyond hard errors and terminal I/O do not represent the complete list, but rather provide you with a well-rounded set of examples of the process-induced and external events that can generate signals. You can find a complete list in any number of texts on UNIX programming.

In terms of actual implementation, a signal is represented as a bit in a data structure (several data structures, actually, as you'll see shortly). More succinctly, the posting of a signal by the kernel results in a bit getting set in a structure member at either the process or thread level. Because each signal has a unique signal number, a structure member of sufficient width is used, which allows every signal to be represented by simply setting the bit that corresponds to the signal number of the signal you wish to post (for example, setting the 17th bit to post signal 17, SIGUSR1).

Because Solaris includes more than 32 possible signals, a long or int data type is not sufficiently wide to represent each possible signal as a unique bit, so a data structure is required. The k_sigset_t data structure defined in /usr/include/signal.h is used in several of the process data structures to store the posted signal bits. It's an array of two unsigned long data types (array members 0 and 1), providing a bit width of 64 bits.

k_sigset_t data structure
Figure 1. k_sigset_t data structure
(Click image to enlarge.)
Signals in Solaris

The multithreaded architecture of Solaris made for some interesting challenges in developing a means of supporting signals that comply with the UNIX signal semantics, as defined by industry standards such as POSIX. Signals traditionally go through two well-defined stages: generation and delivery. Signal generation is the point of origin of the signal, or the sending phase. A signal is said to be delivered when whatever disposition that has been established for the signal is invoked, even if it is to be ignored. If a signal is being blocked, thus postponing delivery, it is considered pending.

User threads in Solaris, created via explicit calls to either thr_create(3T) or pthread_create(3T), all have their own signal masks. Threads can choose to block signals independent of other threads executing in the same process, which allows different threads to take delivery of different signals at various times during process execution. The thread's libraries (POSIX and Solaris threads) provide thr_sigsetmask(3T) and pthread_sigmask(3T) interfaces for establishing per-user thread signal masks. The disposition and handlers for all signals are shared by all the threads in a process. So, for example, a SIGINT with the default disposition in place will cause the entire process to exit.

Signals generated as a result of a trap (SIGFPE, SIGILL, etc) are sent to the thread that caused the trap. Asynchronous signals are delivered to the first thread that is found not blocking the signal.

The difficulty in implementing semantically correct signals in Solaris arises from the fact that user-level threads are not visible to the kernel; the low-level kernel signal code has no way of knowing which threads have which signals blocked and, thus, which thread a signal should be sent to. Some sort of intermediary phase needed to be implemented, something that had visibility to the user-thread signal masks as well as to the kernel. The solution comes in the form of a special LWP that is created by the thread's library for programs that are linked to libthread, called the aslwp (it's actually an LWP/kthread pair). The implementation of the aslwp extends the traditional signal generation and delivery phases by adding two additional steps: notification and redirection.

Generation -> Notification -> Redirection -> Delivery

When a signal (generation) is sent to a process, the aslwp is notified, at which point the aslwp will look for a thread that can take delivery of the signal. Once such a thread is located, the signal is redirected and delivered to that thread.

Figure 2 shows the LWP/kthread and user-thread structures used to support signals in the process.

Back to Top

LWP/kthread and user-thread structures used to support signals in the process
Figure 2. LWP/kthread and user-thread structures
used to support signals in the process

(Click image to enlarge.)


About the author

Jim Mauro is a Senior Staff Engineer in the Performance and Availability Engineering group at Sun Microsystems, where he focuses on system availability and failure recovery. When not working or writing, Jim enjoys building Legos with his 2 sons, reading a wide variety of fiction and non-fiction, listening to music, and drooling over the next upgrade of his stereo system.


Reprinted with permission from the April 1999 edition of SunWorld magazine. Copyright Web Publishing Inc., an IDG Communications company.


* 以上用户言论只代表其个人观点,不代表CSDN网站的观点或立场
    • 访问:584689次
    • 积分:10332
    • 等级:
    • 排名:第1538名
    • 原创:445篇
    • 转载:3篇
    • 译文:0篇
    • 评论:11条