When I first started in the Solaris group, I was faced with two equally difficult tasks: learning the development model, and understanding the source code. For both these tasks, the recommended method is usually picking a small bug and working through the process. For the curious, the first bug I putback to ON was 4912227 (ptree call returns zero on failure), a simple bug with near zero risk. It was the first step down a very long road.
As a another first step, someone suggested adding a very simple system call to the kernel. This turned out to be a whole lot harder than one would expect, and has so many subtle aspects that experienced Solaris engineers (myself included) still miss some of the necessary changes. With that in mind, I thought a reasonable first OpenSolaris blog would be describing exactly how to add a new system call to the kernel.
For the purposes of this post, we will assume that it's a simple system call that lives in the generic kernel code, and we'll put the code into an existing file to avoid having to deal with Makefiles. The goal is to print an arbitrary message to the console whenever the system call is issued.
1. Picking a syscall number
Before writing any real code, we first have to pick a number that will represent our system call. The main source of documentation here is syscall.h, which describes all the available system call numbers, as well as which ones are reserved. The maximum number of syscalls is currently 256 (NSYSCALL), which doesn't leave much space for new ones. This could theoretically be extended - I believe the hard limit is in the size of sysset_t, whose 16 integers must be able to represent a complete bitmask of all system calls. This puts our actual limit at 16*32, or 512, system calls. But for the purposes of our tutorial, we'll pick system call number 56, which is currently unused. For my own amusement, we'll name our (my?) system call 'schrock'. So first we add the following line to syscall.h
#define SYS_uadmin 55
#define SYS_schrock 56
#define SYS_utssys 57
2. Writing the syscall handler
Next, we have to actually add the function that will get called when we invoke the system call. What we should really do is add a new file schrock.c to usr/src/uts/common/syscall, but I'm trying to avoid Makefiles. Instead, we'll just stick it in getpid.c:
#include <sys/cmn_err.h>
int
schrock(void *arg)
{
char buf[1024];
size_t len;
if (copyinstr(arg, buf, sizeof (buf), &len) != 0)
return (set_errno(EFAULT));
cmn_err(CE_WARN, "%s", buf);
return (0);
}
Note that declaring a buffer of 1024 bytes on the stack is a very bad thing to do in the kernel. We have limited stack space, and a stack overflow will result in a panic. We also don't check that the length of the string was less than our scratch space. But this will suffice for illustrative purposes. The cmn_err() function is the simplest way to display messages from the kernel.
3. Adding an entry to the syscall table
We need to place an entry in the system call table. This table lives in sysent.c, and makes heavy use of macros to simplify the source. Our system call takes a single argument and returns an integer, so we'll need to use the SYSENT_CI macro. We need to add a prototype for our syscall, and add an entry to the sysent and sysent32 tables:
int rename();
void rexit();
int schrock();
int semsys();
int setgid();
/* ... */
/* 54 */ SYSENT_CI("ioctl", ioctl, 3),
/* 55 */ SYSENT_CI("uadmin", uadmin, 3),
/* 56 */ SYSENT_CI("schrock", schrock, 1),
/* 57 */ IF_LP64(
SYSENT_2CI("utssys", utssys64, 4),
SYSENT_2CI("utssys", utssys32, 4)),
/* ... */
/* 54 */ SYSENT_CI("ioctl", ioctl, 3),
/* 55 */ SYSENT_CI("uadmin", uadmin, 3),
/* 56 */ SYSENT_CI("schrock", schrock, 1),
/* 57 */ SYSENT_2CI("utssys", utssys32, 4),
4. /etc/name_to_sysnum
At this point, we could write a program to invoke our system call, but the point here is to illustrate everything that needs to be done to integrate a system call, so we can't ignore the little things. One of these little things is /etc/name_to_sysnum, which provides a mapping between system call names and numbers, and is used by dtrace(1M), truss(1), and friends. Of course, there is one version for x86 and one for SPARC, so you will have to add the following lines to both the intel and SPARC versions:
ioctl 54
uadmin 55
schrock 56
utssys 57
fdsync 58
5. truss(1)
Truss does fancy decoding of system call arguments. In order to do this, we need to maintain a table in truss that describes the type of each argument for every syscall. This table is found in systable.c. Since our syscall takes a single string, we add the following entry:
{"ioctl", 3, DEC, NOV, DEC, IOC, IOA}, /* 54 */
{"uadmin", 3, DEC, NOV, DEC, DEC, DEC}, /* 55 */
{"schrock", 1, DEC, NOV, STG}, /* 56 */
{"utssys", 4, DEC, NOV, HEX, DEC, UTS, HEX}, /* 57 */
{"fdsync", 2, DEC, NOV, DEC, FFG}, /* 58 */
Don't worry too much about the different constants. But be sure to read up on the truss source code if you're adding a complicated system call.
6. proc_names.c
This is the file that gets missed the most often when adding a new syscall. Libproc uses the table in proc_names.c to translate between system call numbers and names. Why it doesn't make use of /etc/name_to_sysnum is anybody's guess, but for now you have to update the systable array in this file:
"ioctl", /* 54 */
"uadmin", /* 55 */
"schrock", /* 56 */
"utssys", /* 57 */
"fdsync", /* 58 */
7. Putting it all together
Finally, everything is in place. We can test our system call with a simple program:
#include <sys/syscall.h>
int
main(int argc, char **argv)
{
syscall(SYS_schrock, "OpenSolaris Rules!");
return (0);
}
If we run this on our system, we'll see the following output on the console:
June 14 13:42:21 halcyon genunix: WARNING: OpenSolaris Rules!
Because we did all the extra work, we can actually observe the behavior using truss(1), mdb(1), or dtrace(1M). As you can see, adding a system call is not as easy as it should be. One of the ideas that has been floating around for a while is the Grand Unified Syscall(tm) project, which would centralize all this information as well as provide type information for the DTrace syscall provider. But until that happens, we'll have to deal with this process.
-------------------------------------------------This article is originally posted at http://blogs.sun.com/eschrock/date/20050614#how_to_add_a_system