Measuring kernel latencies to ensure real-time constraints

Device drivers in the kernel often need to perform some task in response to some events. To do this, there is not one but many different ways. These deferred execution methods include the Linux workqueue, the tasklet, the kernel thread and so on. Different methods have different scheduling priorities and thus different response latencies. They also differ in what execution context they are in (e.g., process vs. interrupt context), and may affect which method is more suitable for a specific purpose.

In this article, I describe what I find in experimenting the different methods, in terms of their response latencies, and how the system load, and user space task priorities affect them:

  1. workqueue
  2. tasklet
  3. kernel thread

The latency here means the time after a task is invoked and before it is executed. It depends on Linux scheduler latency, the deferred execution method (workqueue vs. tasklet vs. kthread), and the priorities of competing tasks. The first item, the scheduler latency, means the time between a service being requested and the time the scheduler being executed. This was a significant issue for early Linux kernel because it was not preemptive, and thus the kernel scheduler might not get executed for a fairly long period of time after an event is raised. In the recent kernel, the scheduler latency has been greatly reduced due to preemption in the kernel. The caveat is, however, some synchronization techniques, such as the spinlock, may still prevent preemption from happening, and thus can still slow down kernel response in some conditions.

The latency being measured here is not for the scheduler, but for the scheduled task to be executed. This depends also on the scheduling algorithm, and the priority of the task and other competing tasks. One can find detailed discussion of Linux scheduler here (http://oreilly.com/catalog/linuxkernel/chapter/ch10.html). Now, it suffices to say that the scheduler will do three things in order:

  1. execute deferred tasks in the task queue
  2. execute the bottom half (deferred tasklets and soft irqs)
  3. find some process to run based on their scheduling policies (SCHED_FIFO, SCHED_RR, or SCHED_OTHER), and their priorities.

From here, we can see that the tasklet has the highest priority — it is run even before the scheduler looks at the given priorities of any kernel task. Here, the kernel task means an execution unit with a kernel struct task_struct data structure. It includes any userspace process, any POSIX threads (implemented by native posix thread library, NPTL) and any kernel workqueue (either the kernel global queue or one created by a module). Naturally, any of the latter group would have a higher latency than a tasklet. We will see below that the tasklet is indeed the one with lowest latencies in all conditions, especially when the system load is high. However, it is not to say that we should always use the tasklet for everything, because the tasklet runs in an interrupt context, it cannot be used for operations that may sleep (e.g., some memory allocation and I/O), and it may prevent kernel premption and increase kernel latency itself.

To choose a method wisely, we can measure their runtime performance, and understand how quick each of the methods is, and how setting scheduling policies and priorities may change the latencies. Here, I describe some data I got in some tests. The tests are done with a kernel module, which implements three execution methods, the workqueue (without delay), the tasklet (without delay), and waking up an existing kthread. Each method is tested for N times (N = 10,000). The average and maximum latencies are taken. The outstanding system load is 10 real-time user-space threads (with policy SCHED_RR, priority 1). The test thread is running with policy SCHED_RR, priority 20. The priority is set higher than the background threads to avoid starvation.

System without high-priority load

LatencyWorkqueue (global)Workqueue (private)TaskletKthreadUserspace
Avg6 us6 us5 us5 us8 us
Stdev1.414 us1.000 us2.646 us2.646 us3.606 us
Max21 us19 us135 us30 us12 us

System with high-priority load

LatencyWorkqueue (global)Workqueue (private)TaskletKthreadUserspace
Avg101 us101 us6 us195 us49693 us
Stdev222.948 us242.535 us99.895 us333.742 us43963.480 us
Max950022 us950070 us9992 us950160 us49887 us

It can be seen that when the userspace has some high-priority load, the kernel performance is affected as well as the user space performance. The latencies of kernel tasks (as opposed to tasklets) are increased from microsecond levels to almost a second. The good thing, however, is that the kernel tasks are never blocked by userspace load, no matter how high their priority is. This is not the case for userspace threads. If the test real-time thread has lower priority than the real-time background threads, the test thread never gets enough time slices to execute.

Conclusion
Therefore, the conclusion from the result is kernel tasks (threads) have best-case latencies at microseconds level, and worst case performance around 1 second. The worst case for tasklets is lower and is at the millisecond level. The kernel tasks and threads are not easily starved for longer than a few seconds due to userspace workload, while userspace threads may.

Download
Source code of the kernel module and the test tool can be downloaded on github: http://github.com/dankex/tools/tree/master/linux-kernel/wake_latency/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值