# Toybox infrastructure.

Rob Landley rob at landley.net
Wed Mar 10 02:43:52 UTC 2010
On Monday 08 March 2010 20:56:03 Rob Landley wrote:
> I've already written a toybox version, since it was easier for me to write
> that from scratch than try to wrestle with the busybox one.  The new toybox
> command is 98 lines long (1737 bytes of source) and the existing busybox
> one is 206 lines (4974 bytes of source) in current git.

By the way, I'm not sure how much of the "easier" with the writing a new wc
for toybox was me, and how much was the toybox infrastructure.  I'm in the
wierd position of not really wanting to continue toybox as a separate project
that's vastly out-resourced by the busybox development community, but also
finding working on busybox incredibly clumsy and tedious compared to working on
toybox.

What I'd really like to do is port the toybox infrastructure over to busybox,
if you guys are interested.  I'll describe the process of creating the new wc
command to give you guys a feel for it.  (A previous attempt of mine to
document all this is at http://landley.net/code/toybox/code.html by the way.)

Each toybox command is a single C file.  Adding a new command to toybox
involves adding a new file to the toys directory.  That's it.  I don't touch
any makefiles or headers or anything, the rest is entirely generated by the
build script, which scans the toys/*.c files and constructs the other files at
build time.  The generic infrastructure has no specific knowledge of the actual
commands.

To start a new command, I cd into the "toys" subdirectory of my toybox source
code and "cp hello.c wc.c".  The "hello" command is an example which has all
the basic plumbing a command needs (actually way more than a simple hello
world needs) so it can act as a convenient skeleton for new commands.  Note
that I call them "commands" rather than "applets" because this isn't java.
It's a command line, not an applet line.

The toybox hello.c looks like:

/* vi: set sw=4 ts=4:
*
* hello.c - A hello world program.
*
* Copyright 2006 Rob Landley <rob at landley.net>
*
* Not in SUSv4.
* http://www.opengroup.org/onlinepubs/9699919799/utilities/

USE_HELLO(NEWTOY(hello, "e at d*c#b:a", TOYFLAG_USR|TOYFLAG_BIN))

config HELLO
bool "hello"
default n
help
A hello world program.  You don't need this.

Mostly used as an example/skeleton file for adding new commands,
occasionally nice to test kernel booting via "init=/bin/hello".
*/

#include "toys.h"

// Hello doesn't use these, they're here for example/skeleton purposes.

DEFINE_GLOBALS(
char *b_string;
long c_number;
struct arg_list *d_list;
long e_count;

int more_globals;
)

#define TT this.hello

void hello_main(void)
{
printf("Hello world\n");
}

But most of that's example boilerplate for skeleton purposs.  All it _really_
needs is:

/* hello.c - A hello world program.

USE_HELLO(NEWTOY(hello, NULL, TOYFLAG_USR|TOYFLAG_BIN))

config HELLO
bool "hello"
default n
help
A hello world program.  You don't need this.
*/

#include "toys.h"

void hello_main(void)
{
printf("Hello world\n");
}

Each toybox command starts with a specially formatted comment that contains
the command line options, usage info, and kconfig blob for menuconfig.  The
command's help text (spit out by the "help" command, as well as by the command
itself if run with unintelligible options) is also extracted from the kconfig
help text, so I don't have to describe the same thing twice.

The first few comment lines (the ones starting with an asterisk) are normal
comment lines that don't get parsed by anything.  The convention is to put a
there, but it's really just a comment.

The USE_XXX(NEWTOY(XXX)) line defines the command name, command line options,
and install location of each command.  At compile time a sed invocation
collects this line from from every toys/*.c file into "generated/newtoys.h",
which is then #included to set up the command array toy_exec() searches (see
main.c at the top level).

The USE_XXX() macro chops its contents out if the relevant config option isn't
enabled (just like I added to busybox back in 2006).  There's a SKIP_XXX() too
but it's not used much.  So this line is always copied into
generated/newtoys.h, but only _used_ if the relevant config entry is enabled.

The NEWTOY() macro takes three arguments: command name, option string, and
install location.  If you'd like one command to have multiple names there's
also an OLDTOY() macro, which takes four arguments: the new name, the original
name, command options the new name understands (which can differ from the other
name, but they are washed through the same main() function), and install
location.

The install location is used if you give the "toybox" multiplexer any option
beginning with a dash.  Currently for defconfig, it outputs:

./toybox -?
bin/basename usr/bin/bzcat bin/cat usr/bin/catv usr/sbin/chroot
usr/sbin/chvt bin/cksum usr/bin/count bin/cp usr/sbin/df bin/dirname
bin/dmesg bin/echo bin/false bin/help usr/bin/mdev bin/mkfifo sbin/mkswap
bin/nc bin/netcat usr/bin/nice sbin/oneit usr/bin/patch bin/pwd bin/rmdir
usr/bin/seq usr/bin/setsid bin/sh usr/bin/sha1sum bin/sleep usr/bin/sort
bin/sync bin/tee bin/touch bin/toysh bin/true bin/tty bin/uname

A trivial script can go through that output and install the appropriate
symlinks to the "toybox" binary, something like:

for i in $(./toybox -); do ln -s /bin/toybox$i; done

You can run ./toybox without any arguments to get the list of commands without
the paths prepended, to install all the links in the same directory.  (Yes you
can do "toybox cat filename" too, none of the command names start with a dash.)

That leaves the middle argument to NEWTOY(), which is the command line option
string.  This is the biggest difference between toybox and busybox, the option
parsing logic is completely different, and so automated you can largely ignore
it.  However, I'm going to explain it here in more detail than you probably
really need to know. :)

I wrote my own option parser (lib/args.c, which does _not_ call getopt() so
was net smaller than busybox's last I checked).  It's automatically called
before the command's main() function is ever run, using the option string
supplied by NEWTOY() to parse the command line options and fill out global
variables with the appropriate values.  You can disable this automatic option
parsing (and call it manually if you like) by passing NULL in as the option
string in NEWTOY(), which is also how you specify you take no arguments so
that the option parsing can get compiled out if nobody's using it.  See main.c
for details.

The command_main() functions return void and take no arguments, instead you
use global variables.  The main one is the global "toys", which looks like
this:

extern struct toy_context {
struct toy_list *which;  // Which entry in toy_list is this one?
int exitval;             // Value error_exit feeds to exit()
char **argv;             // Original command line arguments
unsigned optflags;       // Command line option flags from get_optflags()
char **optargs;          // Arguments left over from get_optflags()
int optc;                // Count of optargs
int exithelp;         // Should error_exit print a usage message first?
} toys;

toys.optflags is filled out by the option parsing logic with the command line
flags seen this run.  exitval defaults to 0 but can be changed by other stuff
(such as any of the functions that exit with an error, or by setting it
manually before returning from main().)  optargs[] contains the options left
over after option parsing.  (So "ls -l file1 file2 file3", optargs[0] would be
file1 and optargs[2] would be file3, and optargs[3] would be NULL.  optc the
equivalent of argc for optargs.  argv[] is the unprocessed argument list, kept
around since we can't free it anyway and there's a couple times you might want
to know.  (Such as if you passed NULL as the option string to NEWTOY().)

The other interesting global is "this", which is a union of structures
containing all your global variables for each command.  That's initialized by
the DEFINE_GLOBALS() macro a bit further down in the file, which lists the
global variables for this file.  The contents becomes a structure in a union of
all such structures for each command, which can be accessed as
"this.commandname" (in this case, "this.wc").  The #define TT this.wc is a
shortcut so we can say TT.wc if we have any globals.  (I should make the
#define TT automatic as part of DEFINE_GLOBALS() or something, but haven't
figured out how yet.  Alas, you can't have a macro resolve to a preprocessor
directive.)

If there are no global variables used by this command, you can omit the
DEFINE_GLOBALS() block entirely.  But if the command line parsing saves
results to any variables, you need to list them at the start of the
DEFINE_GLOBALS() block:

1) In order (from right to left).
2) All of them are long/pointer size.  (4 bytes on 32-bit, 8 on 64-bit.)

The options are numbered from right to left because that way anybody familiar
with boolean can work out the flag values in their head: "The option string has
abcdefg, command line is -adg, that's 1001001... that's 64+8+1".  Whereas if
you number them the other way, you have to reverse them in your head to work
out the values.  (This means add extra variables to the beginning of the
string to avoid renumbering the others.)

So if I had an option string "ab:d#" the options are d=1, b=2, a=3 (ignore the
non-letters for that), and the associated globals block could look like:

DEFINE_GLOBALS(
long value_for_d;
char *value_for_b;

int any_other_globals;
)

The appended : means "takes a string argument" (just like in getopt), the
appended # means "takes a number argument".  Said arguments are saved into the
global block, right to left becoming top to bottom.

By convention, I put a space between globals filled out by the option parsing
logic and globals that are just globals used by the code.  Note that all of
the globals are initialized to zero to start with, and then the option parsing
logic can set the first few to other values, but any that aren't initialized by
the option parsing logic (including ones that _could_ but that option wasn't
used this time) are still reliably zeroed.

That pretty much gets us through all the boilerplate, and in fact is probably
way more info than you'd really need to know to implement the wc command.

Rob
--
Latency is more important than throughput. It's that simple. - Linus Torvalds