Inside OpenSolaris: Solaris Driver Programming

Inside OpenSolaris: Solaris Driver Programming

by Max Bruning

April 15, 2005

This article describes Solaris device drivers programming in terms that a developer of Linux device drivers will understand. Basically, the article attempts to answer the question, "A Linux driver does xxx, how does one do this in Solaris?" A previous article, Introduction to Solaris Drivers, explained the system architecture and driver visibility on Solaris. This article explains basic driver programming. We assume some familiarity with device drivers and devices and in general discuss Solaris 10 and Linux 2.6. Most of the article should hold for earlier releases of either system.

Overview of Solaris Device Drivers

This section gives an overview of drivers on Solaris. It is not a complete description of all driver routines, but gives some idea of the overall structure of a Solaris driver and examines many of the routines that a driver implementer must provide.

DDI/DKI

On Solaris, the Device Driver Interface (DDI) and Driver/Kernel Interface (DKI) define the entry points to a driver and the kernel routines and data structures that a driver may use. The DDI defines the routines that a driver needs to supply in order to work with Solaris. The DKI (Driver/Kernel Interface) defines the data structures and kernel routines that a driver may use. By using the documented DDI/DKI interfaces, device driver writers receive source and binary compatibility across different releases of the OS. On Linux, because driver writers have had full access to the kernel source code, a driver writer could make use of any kernel routine or data structure. Now that all of the Solaris kernel source will be available, a programmer could use any kernel routine or data structure. However, it's still a good idea to adhere to the DDI/DKI. If there is a routine or data structure missing from the DDI/DKI that you need, then you should present an argument for including it in the DDI/DKI. Having said that, the DDI/DKI does not preclude you from using undocumented routines. In other words, don't let the DDI/DKI stand in the way of making your device work. On the other hand, don't assume that your driver will continue to work in future releases, as undocumented routines and data structures may change.

Where is the DDI/DKI? In the manual! Section 9 of the manual documents all driver entry points, kernel routines, and kernel data structures in the DDI/DKI. Driver entry points are in Section 9e, kernel functions in Section 9f, data structures in Section 9s, and device properties in Section 9p. For instance, suppose you need to schedule a function to run at some time in the future, possibly to set a timer or to check the status of a device. One approach is to search the source code. A better approach is to use apropos time or man -k time (though you may need to use catman -w for these commands to work). These commands will probably give you more information than you want, as they show all manual pages with time in the NAME section of the manual page. I prefer a quicker method:

bash-2.05b# cd /usr/man/sman9f
bash-2.05b# ls *time*
cv_timedwait.9f qtimeout.9f
cv_timedwait_sig.9f quntimeout.9f
ddi_get_time.9f timeout.9f
devmap_set_ctx_timeout.9f untimeout.9f
gethrtime.9f
bash-2.05b#

A brief examination of the pages shows that timeout(9f) is probably the routine to use. Possibly the best method is to look it up in the Writing Device Drivers guide.

It's possible that you won't find the function you need, or that the function you find is not quite right. timeout(9f), for instance, gives you clock tick granularity. Clock ticks are usually ten milliseconds. If you need a function with finer timing, you might find it in the source code, but if it's not documented, it is not in the DDI/DKI. You can use the function, but it may not work the same way, take the same parameters, or even return the same value in a future release.

Solaris Driver Data Structures

Minimally, Solaris drivers must have the following data structures:

  • modlinkage(9s): Used to install, remove, and retrieve information about the driver.
  • modldrv(9s): Used for linking the driver into the kernel.
  • dev_ops(9s): Device operations common to nexus and leaf node drivers.
  • cb_ops(9s): Entry points for character and block drivers.

Note that some types of devices (including STREAMS, GLD, SCSA and USB) may need additional structures. The following shows the structures as defined in /usr/include/sys/devops.h or /usr/include/sys/modctl.h.

struct modlinkage {  /* from sys/modctl.h */
int ml_rev; /* rev of loadable modules system */
#ifdef _LP64
void *ml_linkage[7]; /* more space in 64-bit OS */
#else
void *ml_linkage[4]; /* NULL terminated list of */
/* linkage structures */
#endif
};

struct modldrv { /* also in sys/modctl.h */
struct mod_ops *drv_modops; /* must be &mod_driverops */
char *drv_linkinfo; /* typically, "name version #" */
struct dev_ops *drv_dev_ops;
};

struct dev_ops { /* in sys/devops.h */
int devo_rev; /* Driver build version */
int devo_refcnt; /* device reference count */

int (*devo_getinfo)(dev_info_t *dip,
ddi_info_cmd_t infocmd, void *arg, void **result);
int (*devo_identify)(dev_info_t *dip);
int (*devo_probe)(dev_info_t *dip);
int (*devo_attach)(dev_info_t *dip, ddi_attach_cmd_t cmd);
int (*devo_detach)(dev_info_t *dip, ddi_detach_cmd_t cmd);
int (*devo_reset)(dev_info_t *dip, ddi_reset_cmd_t cmd);

struct cb_ops *devo_cb_ops; /* cb_ops pointer for leaf drivers */
struct bus_ops *devo_bus_ops; /* bus_ops pointer for nexus drivers */
int (*devo_power)(dev_info_t *dip, int component,
int level);
};

struct cb_ops {
int (*cb_open)(dev_t *devp, int flag, int otyp, cred_t *credp);
int (*cb_close)(dev_t dev, int flag, int otyp, cred_t *credp);
int (*cb_strategy)(struct buf *bp);
int (*cb_print)(dev_t dev, char *str);
int (*cb_dump)(dev_t dev, caddr_t addr, daddr_t blkno, int nblk);
int (*cb_read)(dev_t dev, struct uio *uiop, cred_t *credp);
int (*cb_write)(dev_t dev, struct uio *uiop, cred_t *credp);
int (*cb_ioctl)(dev_t dev, int cmd, intptr_t arg, int mode,
cred_t *credp, int *rvalp);
int (*cb_devmap)(dev_t dev, devmap_cookie_t dhp, offset_t off,
size_t len, size_t *maplen, uint_t model);
int (*cb_mmap)(dev_t dev, off_t off, int prot);
int (*cb_segmap)(dev_t dev, off_t off, struct as *asp,
caddr_t *addrp, off_t len, unsigned int prot,
unsigned int maxprot, unsigned int flags, cred_t *credp);
int (*cb_chpoll)(dev_t dev, short events, int anyyet,
short *reventsp, struct pollhead **phpp);
int (*cb_prop_op)(dev_t dev, dev_info_t *dip,
ddi_prop_op_t prop_op, int mod_flags,
char *name, caddr_t valuep, int *length);

struct streamtab *cb_str; /* streams information */

/*
* The cb_flag fields are here to tell the system a
* bit about the device. The bit definitions are
* in <sys>.
*/
int cb_flag; /* driver compatibility flag */
int cb_rev; /* cb_ops version number */
int (*cb_aread)(dev_t dev, struct aio_req *aio, cred_t *credp);
int (*cb_awrite)(dev_t dev, struct aio_req *aio, cred_t *credp);
};

Note: Many data structures in Linux do their initialization via a .tag field. Solaris currently does not do this (perhaps due to a compiler issue). For example, Linux may initialize the following structure:

struct foo {
int a;
int b;
};

with code such as:

struct foo foobar = {
.a = 10,
.b = 20
};

where the Solaris initialization is typically:

struct foo foobar = {
10, /* a */

20 /* b */
};

In other words, Solaris uses comments instead of the .tag idiom.

The following shows an example using the structures for a simple character device, the foo device.

static struct cb_ops foo_cb_ops = {
foo_open,
foo_close,
nodev, /* strategy only for block devices */
nodev, /* print only for block devs */
nodev, /* " " " " " */
foo_read,
foo_write,
foo_ioctl,
nodev, /* no devmap (no mmap support) */
nodev, /* no mmap */
nodev, /* no segmap (for mmap) */
nochpoll, /* no poll support */
ddi_prop_op, /* let nexus node above driver handle property requests */
NULL, /* no streamtab (not a STREAMS device) */
D_MP, /* cb_flag, (other flags for STREAMS driver) */
CB_REV, /* revision number of cb_ops */
nodev, /* no async read */
nodev /* no async write */
};

static struct dev_ops foo_dev_ops = {
DEVO_REV, /* revision number of this struct (compatibility) */
0, /* devo_refcnt, set by driver framework */
foo_getinfo,
nulldev, /* identify, obsolete since solaris 2.6 */
nulldev, /* probe, must return success for driver to attach */
foo_attach,
foo_detach,
nodev, /* no reset */

&foo_cb_ops,
NULL, /* no bus_ops (needed for nexus devices) */
nodev /* no power(9e) */
};

static struct modldrv foo_modldrv = {
&mod_driverops, /* must be this value, identifies module as a driver */
"foo driver version 0.1",
&foo_dev_ops
};

static struct modlinkage foo_ml = {
MODREV_1, /* revision number of structure (compatibility) */
&foo_modldrv,
0
};

Loading/Unloading

A Solaris driver has three routines involved in loading and unloading a driver. These are _init(9e), _info(9e), and _fini(9e). Not only does every driver in Solaris have these routines, but, in fact, almost every other type of kernel module includes them. The kernel runtime linker, krtld, expects to find these routines. If it doesn't, your driver won't be (dynamically) linked into the system.

Note: All drivers on Solaris are dynamically linked into the system, typically at boot or at the first access of the device. Your driver must be dynamically loadable.

Note: Most Solaris driver functions return 0 (DDI_SUCCESS) on success and a positive errno on failure, unlike Linux, which returns a negative errno on failure. Any returned values typically go into a passed-in argument.

The following shows an example of using these routines.

int
_init(void)
{
int error;
/*
* allocate and initialize any data needed for all
* instances of the device, (per device instance data
* is allocated and initialized in attach(9e)
*/

error = mod_install(&foo_ml);
return (error);
}

int
_fini(void)
{
int error;

error = mod_remove(&foo_ml);
if (error == 0)
/*
* de-initialize and free data allocated in _init()
*/
;
return (error);
}

int
_info(struct modinfo *modinfop)
{
return (mod_info(&foo_ml, modinfop));
}

Autoconfiguration

Each instance of a device is automatically configured during boot or driver installation. This is done by the attach(9e) entry point, which the kernel calls for each instance of the device. Instances are either recognized by the hardware/boot system, or configured by entries in a driver.conf(4) file. An optional probe(9e) routine is called to determine if a configured instance actually exists on the system. The probe(9e) entry point need not exist for "self-identifying" devices. Any device plugged into a PCI-type bus is self-identifying. SCSI targets and pseudo devices must have a probe entry to see if the device actually exists. Drivers also have a detach(9e) entry to un-configure an instance or all instances.

The attach(9e) and detach(9e) routines are also called for dynamic reconfiguration support and for the hot-plugging of devices.

The driver.conf(4) file is optional. PCI devices don't need it, but it works with such devices to set different properties. The following shows an example driver.conf file for a pseudo device.

# foo.conf (typically in /usr/kernel/drv/foo.conf)
name="foo" parent="pseudo" instance=0;
name="foo" parent="pseudo" instance=100;

The instance number is passed to various driver routines. It is the driver's responsibility to maintain a mapping of instance numbers to minor device numbers. The simplest mapping is one to one, where the instance number and minor number are the same, but there may be other mappings. For example, a driver for an RS-232C controller with eight serial ports might store the port number in the low order three bits, and the instance number in the high order 29 bits of the minor number. The OS (or driver.conf) chooses the instance number for a given controller, passing it to the driver as a field in a dev_info_t structure. The dev_info_t is meant to be opaque, but DDI routines can retrieve various fields, including the instance number.

The probe(9e) routine should clean up any device state info before returning, making it stateless. If you don't need a probe(9e) routine, you can just return DDI_PROBE_SUCCESS or use nulldev(9f) in the dev_ops(9s) structure for the probe entry. The following shows a pseudo-code example probe(9e) entry point.

static int
foo_probe(dev_info_t *dip)
{
int instance;
int rval;

instance = ddi_get_instance(dip);
ddi_regs_map_setup(...); /* get a handle for accessing device registers */
rval = ddi_peek8(dip, ...); /* try to read an 8bit register on the device */
ddi_regs_map_free(...); /* unmap the handle returned from ddi_regs_map_setup */

if (rval == DDI_FAILURE)
return DDI_PROBE_FAILURE; /* device does not currently exist */

else
return DDI_PROBE_SUCCESS;
}

The attach(9e) entry point is called following a successful probe(9e) to make the device "visible" on the system. It typically allocates state, acquires handle(s) for the device's registers, registers an interrupt handler, does any device-specific hardware initialization, and creates minor device nodes so as to make the device(s) accessible via user open(2) calls. Once attach(9e) has run for a given device instance, files will appear in the /devices tree for each new minor node.

The following shows a pseudo-code example of attach(9e).

static int
foo_attach(dev_info_t *dip, ddi_attach_cmd_t cmd)
{
int instance;

instance = ddi_get_instance(dip);
allocate and initialize any needed state for this instance;
ddi_regs_map_setup(...); /* get a handle for registers */
ddi_dma_alloc_handle(...); /* if device does DMA */
initialize hardware;
ddi_add_intr(...); /* register the interrupt handler */
ddi_create_minor_node(...); /* create minor device node */
return DDI_SUCCESS;
}

Note that the kernel may call ddi_regs_map_setup(9f), ddi_add_intr(9f), and ddi_create_minor_node(9f) multiple times if there are multiple sets of registers, there is more than one interrupt handler, or you need multiple minor devices for the instance. Also note that the cmd argument is either DDI_ATTACH or DDI_RESUME. The example above is basic handling for DDI_ATTACH. DDI_RESUME handles dynamic reconfiguration and power management.

The detach(9e) entry undoes the setup done by attach(9e) when the device falls out of use. Here is a pseudo-code example:

static int
foo_detach(dev_info_t *dip, ddi_detach_cmd_t cmd) /* cmd = DDI_DETACH or DDI_SUSPEND */
{
int instance;
instance = ddi_get_instance(dip);
ddi_remove_minor_node(...);
de-initialize hardware; /* disable interrupts, etc. */
ddi_remove_intr(...);
ddi_regs_map_free(...);
return DDI_SUCCESS;
}

Open, Close, Read, Write

Unlike Linux, Solaris drivers do not access data structures such as inodes or files. In general, the only drivers should only access structures passed into entry points, structures that the driver itself declares, and structures specified by the DDI (manual section 9s). This allows kernel structures to change over time without needing to make changes to the driver. Fields within kernel data structures that a driver might need (for instance, the current location of read/write in a device), are arguments to functions, not the structures themselves. If you think you need access to kernel structures outside of the DDI/DKI, you are probably not using the DDI/DKI correctly. Of course, source code access makes all structures and kernel routines "visible" to a programmer. Still, to avoid having to change your driver if a new release of Solaris changes the structure or routine, you should try to stick with the DDI/DKI.

The following shows pseudo-code examples of driver open(9e), close(9e), read(9e), and write(9e). Writing Device Drivers has examples of using ioctl(2), mmap(2) -related routines, poll(2), and block-driver-related routines.

static int
foo_open(dev_t *devp, int flag, int otyp, cred_t *cred_p) /* called for open(2) or OTYP_LYR */
{
instance = get instance from minor device number; /* driver specific */
if (no state for this instance)
return ENXIO;
initialize driver for this open;
handle exclusive open if needed;
wait for device to be "on-line", if needed;
return DDI_SUCCESS;
}

static int
foo_close(dev_t dev, int flag, int otyp, cred_t *cred_p) /* called only on last close of minor dev */
{
undo any initialization done by open;
return DDI_SUCCESS;
}

static int
foo_read(dev_t dev, struct uio *uio_p, cred_t *cred_p) /* called from read(2) */
{
copy/dma data from device to user address space via uio_p; /* see uio(9s) */
return DDI_SUCCESS; /* uio_p will contain # of bytes read */
}

static int
foo_write(dev_t dev, struct uio *uio_p, cred_t *cred_p) /* called from write(2) */
{
copy/dma data from user address space via uio_p to device;
return DDI_SUCCESS;
}

Kernel Support Utilities

State Handling

To help manage per-instance state, the DDI/DKI provides a set of routines. Per-instance state can include anything you need to keep track of the state for a given device instance. These routines maintain the space for state. You simply request space allocation, retrieve a pointer to the allocated space, initialize the state, update the state, and, when the instance no longer exists, free the state. The kernel manages state (linked list, array, tree, etc.) for you. All of the state routines take either a dev_info_t or instance number as an argument. The driver needs a way to fetch an instance number from the minor device number, but the implementation is up to the driver. The most simple is to make instance number and minor device number equivalent. Note that the instance number is a field in the dev_info_t, retrievable through ddi_get_instance(9f).

The state management routines are:

  • ddi_soft_state_init(9f): Called by the driver's _init(9e) to allocate a handle to be used with the other DDI state routines.
  • ddi_soft_state_zalloc(9f): Called by attach(9e) to allocate and zero out state for a given instance.
  • ddi_get_soft_state(9f): Called by the driver to retrieve the state structure for a given instance.
  • ddi_soft_state_free(9f): Called by detach(9e) to free the state structure for a given instance of the device.
  • ddi_soft_state_fini(9f): Called by _fini(9e) to free the handle initialized by ddi_soft_state_init(9f).

Of course, the device might not have state (doubtful), or you might choose to manage your own state.

Synchronization Mechanisms

Solaris provides four synchronization mechanisms within the kernel to protect simultaneous access and modification of data by multiple threads running in the kernel. These mechanisms are advisory, so you don't have to use them. However, failure to use them where needed will almost certainly result in corrupted or inconsistent data and panics. The synchronization mechanisms are:

  • Mutexes, or mutual exclusion locks. These can be spin or adaptive. Spin mutexes spin if someone already owns the mutex. Adaptive mutexes spin if the owner is running on a processor, and block (switch out) if the owner is not running. In addition, spin mutexes mask interrupts at the level associated with the mutex.
  • Condition variables, which provide a "sleep/wakeup" type mechanism.
  • Semaphores, either counting or binary, implemented via p and v operations.
  • Reader/writer locks, or shared/exclusive locks.

The mutex is the fundamental mechanism underlying the other mechanisms. Here is an example of using mutexes:

kmutex_t mp;  /* must be global, may be dynamically allocated, usually in per-instance state */

<-- in _init() or xxx_attach() -->

/* initialize mutex(es) */
ddi_get_iblock_cookie(devinfop, 0, &iblock_cookie); /* needed if device interrupts */
mutex_init(&mp, NULL, MUTEX_DRIVER, iblock_cookie); /* otherwise, iblock_cookie = NULL */


<-- to use the mutex -->

mutex_enter(&mp); /* acquire the mutex */
critical code goes here;
mutex_exit(&mp); /* release the mutex */

<-- in _fini() or xxx_detach() -->

mutex_destroy(&mp);

Recursive (iterative) mutexes result in panic (unlike Linux). Hierarchical deadlock also results in panic. Fix your bugs!

The following shows an example use of condition variables, though it leaves off the initialization step.

kcondvar_t cv;  /* the condition variable itself is opaque */
kmutex_t mp; /* must use mutex */
int flag = 0; /* represents the condition, event, resource, etc. you are waiting on */

<--thread that needs to wait-->

mutex_enter(&mp);
while (flag == 0) /* while condition not met, resource not available, event has not occurred, etc. */
cv_wait(&cv, &mp); /* mutex is released and thread is switched out */
/* before returning, mutex is re-acquired */
handle condition, resource, event, etc; /* here we own the mutex */
mutex_exit(&mp);

<--wakeup thread (typically interrupt handler) -->

mutex_enter(&mp);
++flag;
cv_signal(&cv); /* or cv_broadcast(&cv), i.e, wakeup one or all */
mutex_exit(&mp);

Dynamic Memory Allocation

Dynamic memory allocation typically uses kmem_alloc(9f) and kmem_free(9f). Also, drivers can use kmem_cache_create(9f), kmem_cache_destroy(9f), kmem_cache_alloc(9f), and kmem_cache_free(9f). For allocation of DMA-able memory, use ddi_dma_mem_alloc(9f).

The kmem_alloc(9f) routine takes a size in bytes and a flag as arguments. The flag simply specifies what to do if the space is not available--either KM_SLEEP or KM_NOSLEEP. In the KM_NOSLEEP case, kmem_alloc() returns a NULL pointer, leaving recovery to the caller.

Device Register Access

To access device registers (and device memory), use the ddi_regs_map_setup(9f) and ddi_get/ddi_put routines. When dealing with a memory mapped device (a frame buffer, for example), map in the buffer using ddi_regs_map_setup(9f), and then use the address returned to you to access the device. The following example works for a simple, made-up, device.

ddi_device_acc_attr_t foo_acc = {  /* used for endianess, ordering constraints */
DDI_DEVICE_ATTR_V0, /*revision number of structure */
DDI_STRUCTURE_LE_ACC, /* device is little endian */
DDI_STRICTORDER_ACC /* all references are issued by cpu in program order, (no re-ordering) */
};

char *dev_regs; /* will point to device registers, or contain port number */
ddi_acc_handle_t *foo_acc_handle; /* filled by ddi_regs_map_setup */

<-- in foo_attach -->

ddi_regs_map_setup(devinfop, 0, &dev_regs, 0, 0, &foo_acc, &foo_acc_handle);

<-- to access device registers -->

x = ddi_get8(foo_acc_handle, dev_regs); /* retrieve 8 byte register */

Interrupt Handling

Solaris interrupt handling is similar to Linux. You may implement an interrupt handler that services the interrupt and returns, or you may implement two interrupt handlers. The first handles the hardware interrupt and a second runs at a lower priority as a software interrupt. Linux tasklets can be implemented using a routine that runs from timeout(9f). Solaris does have task queues, but at present there is no programming interface visible in the DDI/DKI.

Solaris maps interrupts to one of 15 priority levels, where the higher the level, the higher the priority. Interrupts above level 10 are called "high priority interrupts." They run in their own threads. This means they may block (due to trying to obtain an already locked mutex, for instance). Interrupts above level 10 run in the context of the thread that was running when the interrupt occurred. Mutexes associated with high-level interrupts are spin mutexes, and the system masks out interrupts at the level associated with the mutex. The system clock() routine, responsible for time slicing, runs at interrupt level 10. Bus drivers choose the interrupt priorities for devices on the nexus. PCI bus drivers use the class-code configuration space register to assign interrupt levels. For ISA and EISA nexus drivers, the level of all devices is IPL5. A given device instance may be overridden by setting interrupt-priorities=level in the driver.conf(4) file. For more information, see sysbus(4).

The steps taken for handling an interrupt are:

  1. Register interrupt handler(s) in attach(9e) via ddi_add_intr(9f) and, optionally, ddi_add_softintr(9f).
  2. The handler should:
    • Check to see if the device is interrupting. If not, return DDI_INTR_UNCLAIMED. If the driver returns DDI_INTR_UNCLAIMED, code above the driver will try calling the next interrupt handler for the next device (if any) that interrupts at the same priority level. In other words, Solaris polls driver interrupt handlers based on interrupt priority level.
    • Perform any hardware-specific tasks to handle the interrupt.
    • Tell the device the interrupt has been handled. This step is necessary for devices that use level-triggered interrupts.
    • Return DDI_INTR_CLAIMED.

For high-level interrupts, it is preferable to do the minimum possible in the high-level handler and trigger a software interrupt to do any additional processing.

The following example shows an attach(9e) registering an interrupt handler and the handler doing its work.

static uint_t foo_intr(caddr_t arg);

static int
foo_attach(dev_info_t *dip, ddi_attach_cmd_t cmd)
{
int instance;
struct foo_state *fsp;

instance = ddi_get_instance(dip);
ddi_soft_state_zalloc(statep, instance);

fsp = ddi_get_soft_state(statep, instance);

initialize fsp;

initialize mutexes, etc.;

ddi_add_intr(dip, 0, NULL, &idevice_cookie, foo_intr, (caddr_t)fsp);

other initialization (ddi_create_minor_node, etc.);
}

static uint
foo_intr(caddr_t arg)
{
struct foo_state *fsp = (struct foo_state *)arg;

read device register to determine if device is generating interrupt;
if (not my device interrupting)
return DDI_INTR_UNCLAIMED;

handle device interrupt;

write to device to tell it the interrupt has been handled;

return DDI_INTR_CLAIMED;
}

DMA

Solaris and Linux both provide a layer of data abstraction to hide DMA platform differences. The generic DMA layer is new for Linux 2.6. Here, we show the steps taken for DMA in a Solaris driver. Note that the code should be identical regardless of whether the device is on SPARC or x86.

The steps taken for DMA are:

  1. Allocate and initialize a ddi_dma_attr(9s), which describes device and DMA-engine-specific attributes. This is typically a global, static initialization within the driver.
  2. Allocate a DMA "handle" via ddi_dma_alloc_handle(9f), either in attach(9e) or just prior to programming the device to do a specific DMA transfer.
  3. Bind an address to the handle via ddi_dma_addr_bind_handle(9f) or ddi_dma_buf_bind_handle(9f). These routines return a DMA "cookie" (or cookies) that contain the address(es) and size(s) to use to program the device to do DMA.
  4. Program the device to do DMA. This is device-specific.
  5. On completion, you may need to sync caches via a call to ddi_dma_sync(9f).
  6. When finished with the I/O, call ddi_dma_unbind_handle(9f) to free up DMA resources.
  7. When finished with the handle (or in detach(9e)), use ddi_dma_free_handle(9f).

Here is an example for a PCI device.

static ddi_dma_attr_t foo_dma_attr = {
DMA_ATTR_V0, /* version of this structure */
0, /* lowest usable address */
0xffffffffU, /* highest usable address */
0x0ffffff, /* maximum DMAable byte count */
1, /* alignment in bytes */
0x7f, /* bitmap of burst sizes */
1, /* minimum transfer */
0x0ffffffU, /* maximum transfer */
0x0ffffffU, /* maximum segment length */
1, /* maximum number of segments */
1, /* granularity */
0, /* flags (reserved) */
};

ddi_dma_handle_t foo_dma_handle; /* typically in per-instance state */

foo_attach(dev_info_t *dip, ddi_attach_cmd_t cmd))
{
...
if (ddi_dma_alloc_handle(dip, &foo_dma_attr,
DDI_DMA_SLEEP, 0, &foo_dma_handle) != DDI_SUCCESS)
return (DDI_FAILURE);
...
}

foo_startio(...) /* the routine to do the DMA */
{
ddi_dma_cookie_t dma_cookie;
uint32_t ncookies;
caddr_t addr; /* do dma to/from this address, */
/* possible from ddi_dma_mem_alloc(9F) */
size_t len; /* number of bytes to do dma to/from */
uint_t flags;


...
flags = DDI_DMA_RDWR | DDI_DMA_STREAMING; /* or DDI_DMA_CONSISTENT, etc. */
ddi_dma_addr_bind_handle(foo_dma_handle, 0, addr, len, flags,
DDI_DMA_DONTWAIT, NULL, &dma_cookie, &ncookies);

ddi_put32(foo_acc_handle, dev_regs+DMA_ADDR, dma_cookie.dmac_address);
ddi_put32(foo_acc_handle, dev_regs+DMA_SIZE, dma_cookie.dma_size);

ddi_put32(foo_acc_handle, dev_regs+CMD, GO);
}

On completion, the driver interrupt handler calls ddi_dma_unbind_handle(9f) and possibly ddi_dma_sync(9f). The detach(9e) routine calls ddi_dma_free_handle(9f).

Timing and Timers

Solaris timing uses a real-time clock that can generate interrupts at a resolution bound by the processor speed. For scheduling purposes, it fires every 10 milliseconds. As in Linux, this is a clock "tick." Note that 2.6 Linux uses a 1000-tick/second clock, as opposed to the 100-tick/second clock used by Solaris and by previous versions of Linux. User-level programs on Solaris can program the real time clock to fire at nanosecond granularity, rounded up by processor time--much finer than the clock tick granularity of ten or one milliseconds. However, the program interface to use the high-resolution timers is not visible in the DDI/DKI. See clock_settime(3rt) for user-level details and usr/src/uts/common/os/cyclic.c for details on high-resolution timing in Solaris.

Also note that in Solaris, you can change the value of hz or clock ticks/second by setting hires_tick to 1 and hires_hz to the desired time in the /etc/system file. The default is 1000 ticks per second. Here's an example:

set hires_tick=1
set hires_hz=10000 <--- 10000 ticks per second.

High values for hires_hz are not recommended. With the above value, the clock() routine runs 10000 times per second, probably taking more overhead than you want or need. The time-related routines visible in the DDI/DKI are:

  • timeout_id_t timeout(void(* func)(void *), void *arg, clock_t ticks), which schedules a function to run after a specified time measured in clock ticks. This fires a "one-shot" timer. If you want it to go off again (periodically, perhaps), call timeout() again, probably in your func().
  • clock_t untimeout(timeout_id_t id), to cancel a previous call to timeout(9f). Take care with locks. See the manual pages for details.
  • void drv_usecwait(clock_t microsecs), providing a busy-wait for microsecs microseconds. The real time clock (not the 10ms clock) may cause the rounding up of microseconds. You can be preempted or interrupted during this spin. If you don't want that, use ddi_enter_critical(9f)/ddi_exit_critical(9f) instead.
  • clock_t drv_usectohz(clock_t microsecs), which converts microsecs to clock ticks. Use this to find the number of clock ticks for a given number of microseconds before calling timeout(9f), which requires clock ticks.
  • void delay(clock_t ticks), to delay for a specified number of clock ticks. This function switches out the calling thread; it does not busy wait.

Stacks

The kernel allocates stack space on a per-thread basis. Each kernel thread receives 8192 bytes on x86 and 24576 bytes on SPARC for stack space. In addition, a REDZONE page is virtually allocated with no permissions to act as a guard against stack overflow. All of this means that you should be careful about declaring large arrays as local variables. Of course, deep levels of recursion can cause stack overflow problems. If you need a large array for a local variable, it is better to kmem_alloc() the array on the heap, use it, then kmem_free() it before returning.

Error Handling

In general, driver entry points return DDI_SUCCESS (0) on success, and a positive value (actually an errno) on failure. Linux uses negative errno values for failure indications. The errno to use should be consistent with the errnos listed on the man page for the entry point. For instance, if the driver open routine needs to return an error, refer to open(9e) for a list of possible error returns.

As for reporting errors, Solaris drivers can use cmn_err(9f), which is equivalent in most ways to Linux's printk(). Handling some types of errors is beyond the scope of this article. For instance, handling failure of kmem_alloc(9f) can be problematic.

Table 1. Linux/Solaris Driver Comparison

Conclusion

The most important and valuable advice for beginning Solaris driver programming is to follow the documentation of the DDI and DKI. You may occasionally need to step outside of the bounds, but consider carefully if you can achieve your goals while sticking to defined interfaces.

 
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值