linux IRQ Management（八）-- 中断下半部

最新推荐文章于 2025-02-21 09:30:00 发布

Hacker_Albert

最新推荐文章于 2025-02-21 09:30:00 发布

阅读量1.1k

点赞数

分类专栏：中断子系统文章标签： workqueue

本文链接：https://blog.csdn.net/weixin_41028621/article/details/102839820

版权

中断子系统专栏收录该内容

13 篇文章

订阅专栏

本文深入探讨Linux中断处理机制，解析中断下半部概念，包括softirq、tasklet和workqueue的工作原理，以及它们在系统性能和易用性方面的权衡。文章详细介绍了softirq的静态分配、trigger机制和处理流程，tasklet的动态注册、调度和执行，以及workqueue在进程上下文中的优势和使用场景。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

了解中断下半部

1.前言

中断处理Linux分成两个部分进行处理：

中断handler（top half），是全程关闭中断的；
deferable task（bottom half），属于不那么紧急需要处理的事情。在执行bottom half的时候，是开中断的。bottom half的机制：softirq、tasklet，workqueue。

普通的驱动一般不会用softirq，但是由于驱动经常使用的tasklet是基于softirq的，因此，了解softirq机制有助于撰写更优雅的driver。softirq不能动态分配，都是静态定义的。内核已经定义了若干种softirq number，例如网络数据的收发、block设备的数据访问（数据量大，通信带宽高），timer的deferable task（时间方面要求高）。

中断处理的整体框架：在这里插入图片描述

1.1.top half、bottom half

中断处理对系统的性能有直接的影响。如果在通过U盘进行大量数据拷贝的时候，按下一个key，需要半秒的时间才显示出来，这个场景是否让你崩溃？因此，对于那些复杂的、需要大量数据处理的硬件中断，我们不能让handler处理完一切再恢复现场（handler是全程关闭中断的），而是仅仅在handler中处理一部分，具体包括：

有实时性要求的；
和硬件相关的。例如ack中断，read HW FIFO to ram等；
如果是共享中断，那么获取硬件中断状态，以便判断是否是本中断发生。

除此之外，其他的内容都是放到bottom half中处理。在把中断处理过程划分成top half和bottom half之后，关中断的top half被瘦身，可以非常快速的执行完毕，大大减少了系统关中断的时间，提高了系统的性能。

1.2.Background information

Deferred work is a class of kernel facilities that allows one to schedule code to be executed at a later timer. This scheduled code can run either in the context process or in interruption context depending on the type of deferred work. Deferred work is used to complement the interrupt handler functionality since interrupts have important requirements and limitations:

The execution time of the interrupt handler must be as small as possible
In interrupt context we can not use blocking calls

Using deferred work we can perform the minimum required work in the interrupt handler and schedule an asynchronous action from the interrupt handler to run at a later time and execute the rest of the operations.

Deferred work that runs in interrupt context is also known as bottom-half, since its purpose is to execute the rest of the actions from an interrupt handler (top-half).

Timers are another type of deferred work that are used to schedule the execution of future actions after a certain amount of time has passed.

Kernel threads are not themselves deferred work, but can be used to complement the deferred work mechanisms. In general, kernel threads are used as “workers” to process events whose execution contains blocking calls.

There are three typical operations that are used with all types of deferred work:

Initialization. Each type is described by a structure whose fields will have to be initialized. The handler to be scheduled is also set at this time.
Scheduling. Schedules the execution of the handler as soon as possible (or after expiry of a timeout).
Masking or Canceling. Disables the execution of the handler. This action can be either synchronous (which guarantees that the handler will not run after the completion of canceling) or asynchronous.

Attention:
When doing deferred work cleanup, like freeing the structures associated with the deferred work or removing the module and thus the handler code from the kernel, always use the synchronous type of canceling the deferred work.

2.Bottom-half

Linux kernel提供了三种bottom half的机制，来应对不同的需求。

软中断（softirq）：最基本、最优先的软中断处理形式，为了避免名字冲突，本文中将这种子类型的软中断叫softirq。
tasklet：其底层使用softirq机制实现，提供了一种用户方便使用的软中方式，为软中断提供了很好的扩展性。
work queue：前两种软中断执行时是禁止抢占的（softirq的ksoftirq除外），对于用户进程不友好。如果在softirq执行时间过长，会继续推后到work queue中执行，work queue执行处于进程上下文，其可被抢占，也可以被调度，如果软中断需要执行睡眠、阻塞，直接选择work queue。

三者区别：

workqueue运行在process context；
softirq和tasklet运行在interrupt context。
- 同一类型的softirq可以在不同的cpu上并发执行；
- 而tasklet在使用时不需要考虑重入，因此tasklet更佳易用，使用softirq更倾向于性能。

在需要sleep需求的场景下，defering task必须延迟到kernel thread中执行，也就是说必须使用workqueue机制。softirq和tasklet 从本质上讲，bottom half机制的设计有两方面的需求，一个是性能，一个是易用性。设计一个通用的bottom half机制来满足这两个需求非常的困难，因此，内核提供了softirq和tasklet两种机制。softirq更倾向于性能，而tasklet更倾向于易用性。

2.softirq基础知识

2.1.preempt_count

/*
 * low level task data that entry.S needs immediate access to.
 * __switch_to() assumes cpu_context follows immediately after cpu_domain.
 */
struct thread_info {
	unsigned long		flags;		/* low level flags */
	mm_segment_t		addr_limit;	/* address limit */
	struct task_struct	*task;		/* main task structure */
	struct exec_domain	*exec_domain;	/* execution domain */
	struct restart_block	restart_block;
	int			preempt_count;	/* 0 => preemptable, <0 => bug */
	int			cpu;		/* cpu */
};

preempt_count代表的是该进程是否可以被抢占。

peermpt_count = 0：当前进程可以被抢占；
peermpt_count < 0：存在bug；
peermpt_count > 0：当前进程不可以被抢占。

不可抢占的原因很多，比如当前进程在中断上下文中或者使用了锁（spin_lock的过程中会disable掉抢占的）。至于当前是什么原因不能被抢占，就需要看peermpt_count每个字段的含义：
在这里插入图片描述

bit0-7代表的是抢占的次数，最大抢占深度为256次；
bit8-15代表的是软中断的次数，最大也是256次；
bit16-19表示硬件中断的次数，注释的大概意思是避免中断嵌套，但是也不能防止某些驱动中嵌套使用中断，所以嵌套16层也是最大次数了。
bit20代表NMI中断;
bit21代表当前抢占是否active。

linux系统为了方便得出各个字段的值，提供了一系列宏定义如下：

   include/linux/preempt.h： 
#define PREEMPT_SHIFT	0
#define SOFTIRQ_SHIFT	(PREEMPT_SHIFT + PREEMPT_BITS)                        //0+8=8
#define HARDIRQ_SHIFT	(SOFTIRQ_SHIFT + SOFTIRQ_BITS)                        //8+8=16
#define NMI_SHIFT	(HARDIRQ_SHIFT + HARDIRQ_BITS)                        //16+4=20

#define __IRQ_MASK(x)	((1UL << (x))-1)

#define PREEMPT_MASK	(__IRQ_MASK(PREEMPT_BITS) << PREEMPT_SHIFT)
#define SOFTIRQ_MASK	(__IRQ_MASK(SOFTIRQ_BITS) << SOFTIRQ_SHIFT)
#define HARDIRQ_MASK	(__IRQ_MASK(HARDIRQ_BITS) << HARDIRQ_SHIFT)
#define NMI_MASK	(__IRQ_MASK(NMI_BITS)     << NMI_SHIFT)

#define PREEMPT_OFFSET	(1UL << PREEMPT_SHIFT)                //1<<0
#define SOFTIRQ_OFFSET	(1UL << SOFTIRQ_SHIFT)                //1<<8
#define HARDIRQ_OFFSET	(1UL << HARDIRQ_SHIFT)                //1<<16
#define NMI_OFFSET	(1UL << NMI_SHIFT)                    //1<<20

#define SOFTIRQ_DISABLE_OFFSET	(2 * SOFTIRQ_OFFSET)           //16

#define PREEMPT_ACTIVE_BITS	1
#define PREEMPT_ACTIVE_SHIFT	(NMI_SHIFT + NMI_BITS)
#define PREEMPT_ACTIVE	(__IRQ_MASK(PREEMPT_ACTIVE_BITS) << PREEMPT_ACTIVE_SHIFT)

#define hardirq_count()	(preempt_count() & HARDIRQ_MASK)                                     //硬中断count
#define softirq_count()	(preempt_count() & SOFTIRQ_MASK)                                     //软中断count
#define irq_count()	(preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK \
				| NMI_MASK))                                                  //所有中断=硬+软+NMI

从上述的定义可以得出，如果想知道硬中断的次数就使用hardirq_count，如果想知道软中断次数就使用softirq_count，如果想知道所有中断的次数就使用irq_count。

详细分析如下：

BIT<0:7>：preemption count用来记录当前被显式的禁止抢占的次数。

调用preempt_disable，preemption count会加一；
调用preempt_enable，preemption count会减一。

preempt_disable和preempt_enable必须成对出现，可以嵌套，最大嵌套的深度是255。

  186 #define preempt_enable() \
  187 do { \
  188     barrier(); \
  189     if (unlikely(preempt_count_dec_and_test())) \
  190         __preempt_schedule(); \
  191 } while (0) 
  
  169 #define preempt_disable() \
  170 do { \
  171     preempt_count_inc(); \                                                                             
  172     barrier(); \
  173 } while (0)

Bit<8:15>：softirq count进行操作有两个场景：

在进入soft irq handler之前给 softirq count加一，退出soft irq handler之后给 softirq count减去一。由于soft irq handler在一个CPU上是不会并发的，总是串行执行，因此，这个场景下只需要一个bit就够了，也就是上图中的bit 8。通过该bit可以知道当前task是否在sofirq context。
由于内核同步的需求，进程上下文需要禁止softirq。这时候，kernel提供了local_bh_enable和local_bh_disable这样的接口函数。这部分的概念是和preempt disable/enable类似的，占用了bit9～15，最大可以支持127次嵌套。

Bit<16:19>：hardirq count描述当前中断handler嵌套的深度为15层。对于ARM平台的linux kernel，其中断部分的代码如下：

kernel/irq/irqdesc.c：
void handle_IRQ(unsigned int irq, struct pt_regs *regs) 
{ 
    struct pt_regs *old_regs = set_irq_regs(regs);

    irq_enter();  
    generic_handle_irq(irq);

    irq_exit(); 
    set_irq_regs(old_regs); 
}

通用的IRQ handler被irq_enter和irq_exit这两个函数包围。irq_enter代表进入到IRQ context，irq_exit代表退出IRQ context。

irq_enter()调用preempt_count_add(HARDIRQ_OFFSET)，为hardirq count的bit field增加1。
irq_exit()调用preempt_count_sub(HARDIRQ_OFFSET)，为hardirq count的bit field减去1。

2.2.各种上下文

   88  * in_irq()       - We're in (hard) IRQ context
   89  * in_softirq()   - We have BH disabled, or are processing softirqs
   90  * in_interrupt() - We're in NMI,IRQ,SoftIRQ context or have BH disabled
   91  * in_serving_softirq() - We're in softirq context
   92  * in_nmi()       - We're in NMI context
   93  * in_task()      - We're in task context
   94  *
   95  * Note: due to the BH disabled confusion: in_softirq(),in_interrupt() really
   96  *       should not be used in new code.
   97  */
   98 #define in_irq()        (hardirq_count())
   99 #define in_softirq()        (softirq_count())
  100 #define in_interrupt()      (irq_count())
  101 #define in_serving_softirq()    (softirq_count() & SOFTIRQ_OFFSET)
  102 #define in_nmi()        (preempt_count() & NMI_MASK)                                                   
  103 #define in_task()       (!(preempt_count() & \
  104                    (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET)))

in_irq：判断当前进程是否在硬中断中；
in_softirq：判断是否当前进程在软件中断或者有别的进程disable了软中断；
in_interrupt：判断当前进程是否在中断中；
in_serving_softirq：判断当前进程是否在软件中断中，通过bit8来判断。

3.softirq机制

softirq和hardirq是对应的，softirq是纯软件的，不需要硬件参与。
在这里插入图片描述
3.1.softirq number

和IRQ number一样，对于软中断，linux kernel也是用一个softirq number唯一标识一个softirq，具体定义如下：

include/linux/interrupt.h：
enum
{
    HI_SOFTIRQ=0,                     /* 高优先级tasklet */ /* 优先级最高 */
    TIMER_SOFTIRQ,                    /* 时钟相关的软中断 */
    NET_TX_SOFTIRQ,                   /* 将数据包传送到网卡 */
    NET_RX_SOFTIRQ,                   /* 从网卡接收数据包 */
    BLOCK_SOFTIRQ,                    /* 块设备的软中断 */
    BLOCK_IOPOLL_SOFTIRQ,             /* 支持IO轮询的块设备软中断 */
    TASKLET_SOFTIRQ,                  /* 常规tasklet */
    SCHED_SOFTIRQ,                    /* 调度程序软中断 */
    HRTIMER_SOFTIRQ,                  /* 高精度计时器软中断 */
    RCU_SOFTIRQ,                      /* RCU锁软中断，该软中断总是最后一个软中断 */ 
    NR_SOFTIRQS                       /* 软中断数，为10 */
};

在这里插入图片描述

3.2.软中断向量表（softirq_vec）

softirq是静态定义的，也就是说系统中有一个定义softirq描述符的数组，而softirq number就是这个数组的index。具体定义如下：

//kernel/softirq.c:
struct softirq_action 
{ 
    void    (*action)(struct softirq_action *); 
};

//____cacheline_aligned保证了在SMP的情况下，softirq_vec是对齐到cache line的。
static struct softirq_action softirq_vec[NR_SOFTIRQS] __cacheline_aligned_in_smp;

系统支持多少个软中断，静态定义的数组就会有多少个entry。softirq描述符非常简单，只有一个action成员，表示如果触发了该softirq，那么应该调用action回调函数来处理这个soft irq。对于softirq，没有硬件寄存器，只有软中断状态寄存器soft interrupt state（irq_stat），定义如下：

typedef struct { 
    unsigned int __softirq_pending; 
#ifdef CONFIG_SMP 
    unsigned int ipi_irqs[NR_IPI]; 
#endif 
} ____cacheline_aligned irq_cpustat_t;

irq_cpustat_t irq_stat[NR_CPUS] ____cacheline_aligned;

内核用irq_cpustat_t数据结构来标记曾经有“软中断”发生过（或者说成软中断被触发过），__softirq_pending 共32bit，即每个bit对应软中断的一个向量，实际使用了6个bit ，第n个bit置1，即softirq_vec[n]有软中断发生。当一个驱动的硬件中断被分发给了指定的CPU，并且在该中断handler中触发了一个softirq，那么该CPU负责调用该softirq number对应的action callback来处理该软中断。因此，irq_stat对象每个处理器一个。为了性能，irq_stat中的每一个entry被定义对齐到cache line。

3.3.注册softirq

open_softirq函数注册softirq的action callback函数，具体如下：

void open_softirq(int nr, void (*action)(struct softirq_action *)) 
{ 
    softirq_vec[nr].action = action; 
}

软中断的注册是在编译阶段静态分配的，而tasklet只是在这个基础上做了进一步的封装，允许动态地注册和注销。softirq_vec是一个多CPU之间共享的数据，不过，由于所有的注册都是在系统初始化的时候完成的，那时候，系统是串行执行的。此外，softirq是静态定义的，每个entry（或者说每个softirq number）都是固定分配的，因此，不需要保护。

3.3.1.内核注册软中断的地方：
在这里插入图片描述
3.4.触发softirq

调用raise_softirq函数来触发本地CPU上的softirq，具体如下：

void raise_softirq(unsigned int nr) 
{ 
    unsigned long flags;

    local_irq_save(flags); 
    raise_softirq_irqoff(nr); 
    local_irq_restore(flags); 
}

如果第n位被设置为1，那么第n位对应类型的软中断等待处理。而将对应位置1 的处理，正是触发软中断。

3.5.disable/enable softirq

在linux kernel中，可以使用local_irq_disable和local_irq_enable来disable和enable本CPU中断。和硬件中断一样，软中断也可以disable，接口函数是local_bh_disable和local_bh_enable。local_bh_disable/enable函数就是用来disable/enable bottom half的，这里就包括softirq和tasklet。

3.6.软中断处理

函数_ _do_softirq是一次性按照向量表从高到低循环处理所有软中断（软中断不可嵌套），_ _do_softirq()调用时机：

irq_exit() 硬件中断处理完，返回时调用。

asm_do_IRQ
->handle_IRQ
->irq_exit
->invoke_softirq
->_ _do_softirq()

软中断由内核线程ksoftirqd处理，每一个CPU上都运行着一个ksoftirqd。

run_ksoftirqd
-> __do_softirq

local_bh_enable()时，发现有待处理的软中断且当时没处在软硬中断上下文中。

local_bh_enable()
-> do_softirq
->do_softirq_own_stack
->_ _do_softirq()

3.7.软中断守护内核线程ksoftirqd

上图中Softirq-daemon。

4.tasklet机制

Linux 已经有了 softirq 机制，为什么还需要 tasklet 机制？最主要的原因是 softirq 是多 cpu 执行的，可能碰到很多重入的问题，而 tasklet 同一时刻只能在一个 cpu 上执行，不需要处理重入互斥问题。另外 Linux 也不建议用户去添加新的软中断。

tasklet也是一种软中断，考虑到优先级问题，分别占用了向量表(softirq_vec)中的HI_SOFTIRQ和TASKLET_SOFTIRQ两类软中断。
在这里插入图片描述
在start_kernel()进行系统初始化中，调用softirq_init()函数对HI_SOFTIRQ和TASKLET_SOFTIRQ两个软中断进行了初始化。

void __init softirq_init(void)
{
    int cpu;

    for_each_possible_cpu(cpu) {
        per_cpu(tasklet_vec, cpu).tail =
            &per_cpu(tasklet_vec, cpu).head;
        per_cpu(tasklet_hi_vec, cpu).tail =
            &per_cpu(tasklet_hi_vec, cpu).head;
    }

    /* 开启常规tasklet */
    open_softirq(TASKLET_SOFTIRQ, tasklet_action);
    /* 开启高优先级tasklet */
    open_softirq(HI_SOFTIRQ, tasklet_hi_action);
}

4.1.struct tasklet_struct

struct tasklet_struct
{
    struct tasklet_struct *next; 
    unsigned long state;             // tasklet 状态
    atomic_t count;                     // 锁计数器
    void (*func)(unsigned long);  // tasklet 处理函数
    unsigned long data;               // 传递给 tasklet 处理函数的参数
};

next：指向下一个tasklet的指针，说明这个结构体的成员会被加入到一个链表里。
state:用于标识tasklet状态，这一个无符号长整数，当前只使用了bit[1]和bit[0]两个状态位。其中，bit[1]=1表示这个tasklet当前正在某个CPU上被执行，它仅对SMP系统才有意义，其作用就是为了防止多个CPU同时执行一个tasklet的情形出现；bit[0]=1表示这个tasklet已经被调度去等待执行了但还没有开始执行，其作用是阻止同一个tasklet在被运行之前被重复调度，考虑如下情况：一个tasklet已经被触发过一次，即调度过一次，但可能还没有来得及被执行。对这两个状态位的宏定义如下所示：

enum
{
    TASKLET_STATE_SCHED,
    TASKLET_STATE_RUN
};

可以理解为每个tasklet有一个简单的状态机，0 -> TASKLET_STATE_SCHED -> TASKLET_STATE_RUN -> 0。

count：引用计数，若不为0，则tasklet被禁止，只有当它为0时，tasklet才被激活，也就是说该tasklet的处理函数func才可以被执行，只有设置为激活后，tasklet对应的软中断被raise时该tasklet才会被投入运行。
func：是一个函数指针，也是对应这个tasklet的处理函数。
data：函数func的参数。这是一个32位的无符号整数，其具体含义可供func函数自行解释，比如将其解释成一个指向某个用户自定义数据结构的地址值。

4.2.操作tasklet_struct的相关函数

4.2.1.创建tasklet对象

静态定义tasklet：

#define DECLARE_TASKLET(name, func, data) \ 
struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(0), func, data }

#define DECLARE_TASKLET_DISABLED(name, func, data) \ 
struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(1), func, data }

两个宏之间的区别在于引用计数的初始值设置不同，DECLARE_TASKLET把创建的tasklet的引用计数设置为0，一开始处于激活状态；DECLARE_TASKLET_DISABLED把创建的tasklet的引用计数设置为1，一开始处于禁止(非激活)状态。

动态分配tasklet：

 549 void tasklet_init(struct tasklet_struct *t,                                                            
  550           void (*func)(unsigned long), unsigned long data)
  551 {
  552     t->next = NULL;
  553     t->state = 0;
  554     atomic_set(&t->count, 0);
  555     t->func = func;
  556     t->data = data;
  557 }

无论是静态方式，还是动态方式，都需要传一个函数地址，这个函数就是每个tasklet自己需要实现的处理函数func。

4.2.2.使能／禁止一个tasklet

使能与禁止操作往往总是成对地被调用的，函数如下：

static inline void tasklet_disable(struct tasklet_struct *t)
{
    tasklet_disable_nosync(t);
    tasklet_unlock_wait(t);
    smp_mb();
}

static inline void tasklet_enable(struct tasklet_struct *t)
{
    smp_mb__before_atomic_dec();
    atomic_dec(&t->count);
}

4.2.3.tasklet的任务列表

tasklet_struct里有个next成员，说明这个结构体的成员会被加入到一个链表里，这个链表的链表头定义在kernel/softirq.c中：

struct tasklet_head
{
    struct tasklet_struct *head;
    struct tasklet_struct **tail;
};

尽管tasklet机制是特定于软中断向量HI_SOFTIRQ和TASKLET_SOFTIRQ的一种实现，但是tasklet机制仍然属于softirq机制的整体框架范围内的，因此，它的设计与实现仍然必须坚持“谁触发，谁执行”的思想。为此，Linux为系统中的每一个CPU都定义了一个tasklet队列头部，来表示应该由各个CPU负责执行的tasklet队列。

系统中的每个cpu都会维护一个tasklet的链表，定义如下：

static DEFINE_PER_CPU(struct tasklet_head, tasklet_vec); 
static DEFINE_PER_CPU(struct tasklet_head, tasklet_hi_vec);

其中tasklet_vec用来处理TASKLET_SOFTIRQ类的tasklets。tasklet_hi_vec用来处理HI_SOFTIRQ类的tasklets。

4.2.4.触发函数tasklet_schedule

  633 static inline void tasklet_schedule(struct tasklet_struct *t)
  634 {
  635     if (!test_and_set_bit(TASKLET_STATE_SCHED, &t->state))
  636         __tasklet_schedule(t);
  637 }
  638                                          
  641 static inline void tasklet_hi_schedule(struct tasklet_struct *t)
  642 {
  643     if (!test_and_set_bit(TASKLET_STATE_SCHED, &t->state))
  644         __tasklet_hi_schedule(t);
  645 }

在期望的上半部分，比如一个中断处理函数中，驱动可以通过调用tasklet_schedule来触发TASKLET_SOFTIRQ软中断，检查tasklet_schedule函数可以知道，该函数首先检查该tasklet是否已经被触发过(TASKLET_STATE_SCHED是否已经被设置为1)，如果已经为1则立即退出，说明这个tasklet已经被调度，并在一个CPU上等待被执行但还没有执行。由于同一个tasklet在某一个时刻只能在一个CPU上等待被执行，因此tasklet_schedule()函数什么也不做就直接返回了。避免了重复触发，否则就设置TASKLET_STATE_SCHED对应位为1，注意以上检查和设置的操作以原子的方式进行。此后如果继续触发则调用函数__tasklet_schedule，该函数定义在kernel/softirq.c中，完成实际的触发动作，如下所示：

void __tasklet_schedule(struct tasklet_struct *t)
{
    unsigned long flags;

    local_irq_save(flags);①
    t->next = NULL;②
    *__get_cpu_var(tasklet_vec).tail = t;
    __get_cpu_var(tasklet_vec).tail = &(t->next);
    raise_softirq_irqoff(TASKLET_SOFTIRQ);③
    local_irq_restore(flags);④
}

该函数中：

首先，调用local_irq_save()函数来关闭当前CPU的中断，以保证下面的步骤在当前CPU上原子地被执行。
然后，将待调度的tasklet添加到当前CPU对应的tasklet队列的首部。
接着，调用raise_softirq_irqoff()函数在当前CPU上触发软中断请求TASKLET_SOFTIRQ，具体地就是将软中断状态寄存器irq_stat中对应TASKLET_SOFTIRQ的状态位raise/设置为pending状态。这样内核将在稍后的一个恰当的时间点对该触发的TASKLET_SOFTIRQ软中断进行处理。
最后，调用local_irq_restore()函数来开当前CPU的中断

4.2.5.去掉tasklet

  666 extern void tasklet_kill(struct tasklet_struct *t);                                                    
  667 extern void tasklet_kill_immediate(struct tasklet_struct *t, unsigned int cpu);

4.2.6.软中断向量TASKLET_SOFTIRQ的服务程序tasklet_action

函数tasklet_action()是tasklet机制与软中断向量TASKLET_SOFTIRQ的联系纽带。正是该函数将当前CPU的tasklet队列中的各个tasklet放到当前CPU上来执行的。流程图如下所示：
在这里插入图片描述
小结：

驱动开发模块使用tasklet，首先要定义tasklet(通过内核提供的接口)，包括定义动作并以回调函数的方式记录在tasklet的结构体中，完成这些注册后，驱动模块可以在中断处理的上半部分触发TASKLET_SOFTIRQ软中断，触发的同时将自己定义的tasklet(包括回调函数的地址)加入到内核维护的一个tasklet的链表中，触发完成后，tasklet_action函数作为TASKLET_SOFTIRQ软中断的处理函数以下半部的形式在一个合适的推迟的时间点上被内核运行，该函数会扫描tasklett的链表，针对链表中的每一个预先注册的tasklet，调用回调函数function。

4.3.软中断和tasklet的总结

软中断：

软中断是在编译期间静态分配的。
最多可以有32个软中断。
软中断不会抢占另外一个软中断，唯一可以抢占软中断的是中断处理程序。
可以并发运行在多个CPU上（即使同一类型的也可以）。所以软中断必须设计为可重入的函数（允许多个CPU同时操作），因此也需要使用自旋锁来保护其数据结构。
目前只有两个子系直接使用软中断：网络和SCSI。
执行时间有：从硬件中断代码返回时、在ksoftirqd内核线程中和某些显示检查并执行软中断的代码中。

tasklet：

tasklet是使用两类软中断实现的：HI_SOFTIRQ和TASKLET_SOFTIRQ。本质上没有什么区别，只不过HI_SOFTIRQ的优先级更高一些，建立在HI_SOFTIRQ上的tasklet会早于TASKLET_SOFTIRQ执行。
可以动态增加减少，没有数量限制。
同一类的tasklet不能并发执行。
不同类的tasklet可以并发执行。
大部分情况下推荐使用tasklet。

5.利用tasklet实现中断处理的下半部分

步骤：

1).定义自己tasklet的处理函数

这个回调函数的形式必须是void tasklet_handler(unsigned long data);参数data是传递给回调函数的参数，其值在第一步创建tasklet时给定。

编写tasklet处理函数时要注意以下要求：

该函数不能睡眠，这意味着不能在该函数中使用信号量或者其他什么会导致阻塞的函数。
时刻提醒自己在该处理函数执行过程中中断已经被打开，即任意时候都可能会有中断发生打断该处理函数的执行。
采用tasklet实现下半部时，无论CPU是否多个，但该tasklet的处理函数同时只会在一个CPU上执行，所以不需要担心下半部分的执行过程中对共享数据的竞争问题。
两个不同的tasklet还是会在两个处理器上同时执行的。所以不同tasklet之间或者tasklet和其他软中断之间共享了数据，如果有必要则要适当地进行锁保护。

2).创建一个自己的tasklet

采用静态或者动态方式，在创建的同时将定义的tasklet处理函数的地址注册到内核中。

3).触发软中断

驱动模块在中断处理的上半部分调用tasklet_schedule()函数触发软中断。内核会在稍后的下半部分处理过程中调用我们曾经注册的处理函数执行BH动作。

6.Workqueues

工作队列（work queue）是另外一种将工作推后执行的形式，与tasklet(小任务机制)有所不同。工作队列可以把工作推后，交由一个内核线程去执行，也就是说，这个下半部分可以在进程上下文中执行。这样，通过工作队列执行的代码能占尽进程上下文的所有优势。最重要的就是工作队列允许被重新调度甚至是睡眠。

6.1.何时使用工作队列，何时使用tasklet呢？

如果推后执行的任务需要睡眠，那么就选择工作队列；如果推后执行的任务不需要睡眠，那么就选择tasklet。
另外，如果需要用一个可以重新调度的实体来执行你的下半部处理，也应该使用工作队列。它是唯一能在进程上下文运行的下半部实现的机制，也只有它才可以睡眠。这意味着在需要获得大量的内存时、在需要获取信号量时，在需要执行阻塞式的I/O操作时，它都会非常有用。如果不需要用一个内核线程来推后执行工作，那么就考虑使用tasklet。
一般，不要轻易的去使用工作队列，因为每当创建一条工作队列，内核就会为这条工作队列创建一条内核线程。工作队列位于进程上下文，与软中断，tasklet有所区别，工作队列里允许延时，睡眠操作，而软中断，tasklet位于中断上下文，不允许睡眠和延时操作。

6.2.结构体

工作（work）数据结构为work_struct：

 2 struct work_struct {
 3    atomic_long_t data;
 4    //链表处理
 5    struct list_head entry;
 6    //工作处理函数
 7    work_func_t func;
 8#ifdef CONFIG_LOCKDEP
 9    struct lockdep_map lockdep_map;
10#endif
11};

  115 struct delayed_work {
  116     struct work_struct work;
  117     struct timer_list timer;
  118 
  119     /* target workqueue and CPU ->timer uses to queue ->work */
  120     struct workqueue_struct *wq;
  121     int cpu;
  122 };

这些工作以队列结构组织成工作队列（workqueue），其数据结构为workqueue_struct：

 238 struct workqueue_struct {
   239     struct list_head    pwqs;       /* WR: all pwqs of this wq */
   240     struct list_head    list;       /* PR: list of all workqueues */                                  
   241 
   242     struct mutex        mutex;      /* protects this wq */
   243     int         work_color; /* WQ: current work color */
   244     int         flush_color;    /* WQ: current flush color */
   245     atomic_t        nr_pwqs_to_flush; /* flush in progress */
   246     struct wq_flusher   *first_flusher; /* WQ: first flusher */
   247     struct list_head    flusher_queue;  /* WQ: flush waiters */
   248     struct list_head    flusher_overflow; /* WQ: flush overflow list */
   249 
   250     struct list_head    maydays;    /* MD: pwqs requesting rescue */
   251     struct worker       *rescuer;   /* I: rescue worker */
   252     ...
   253}

6.3.使用工作队列

6.3.1.包含头文件

#include <linux/workqueue.h>

6.3.2.There are two types of work:

work_struct - it schedules a task to run at a later time
struct delayed_work - it schedules a task to run after at least a given time interval

6.3.3.操作工作队列API

1.两种初始化：

using them a work item must be initialized. There are two types of macros that can be used, one that declares and initializes the work item at the same time and one that only initializes the work item (and the declaration must be done separately):

#include <linux/workqueue.h>

DECLARE_WORK(name , void (*function)(struct work_struct *));
DECLARE_DELAYED_WORK(name, void(*function)(struct work_struct *));

INIT_WORK(struct work_struct *work, void(*function)(struct work_struct *));
INIT_DELAYED_WORK(struct delayed_work *work, void(*function)(struct work_struct *));

DECLARE_WORK() and DECLARE_DELAYED_WORK() declare and initialize a work item；
INIT_WORK() and INIT_DELAYED_WORK() initialize an already declared work item.

2.The following sequence declares and initiates a work item:

#include <linux/workqueue.h>

void my_work_handler(struct work_struct *work);
DECLARE_WORK(my_work, my_work_handler);

Or, if we want to initialize the work item separately:

void my_work_handler(struct work_struct * work);
struct work_struct my_work;
INIT_WORK(&my_work, my_work_handler);

3.schedule the task

schedule_work(struct work_struct *work);
schedule_delayed_work(struct delayed_work *work, unsigned long delay);

4.canceled work items

int cancel_work_sync(struct delayed_work *work);
int cancel_delayed_work_sync(struct delayed_work *work);

Note:
The call only stops the subsequent execution of the work item. If the work item is already running at the time of the call, it will continue to run. In any case, when these calls return, it is guaranteed that the task will no longer run.
While there are versions of these functions that are not synchronous (.e.g. cancel_work()) do not use them when you are performing cleanup work otherwise race condition could occur.

5.wait for a workqueue to complete running all of its work items

void flush_scheduled_work(void);

This function is blocking and, therefore, can not be used in interrupt context. The function will wait for all work items to be completed. For delayed work items, cancel_delayed_work must be called before flush_scheduled_work().

Finally, the following functions can be used to schedule work items on a particular processor (schedule_delayed_work_on()), or on all processors (schedule_on_each_cpu()):

int schedule_delayed_work_on(int cpu, struct delayed_work *work, unsigned long delay);
int schedule_on_each_cpu(void(*function)(struct work_struct *));

6.struct workqueue

create new workqueue：

struct workqueue_struct *create_workqueue(const char *name);
struct workqueue_struct *create_singlethread_workqueue(const char *name);

create_workqueue() uses one thread for each processor in the system;
create_singlethread_workqueue() uses a single thread.

add a task in the new queue：

int queue_work(struct workqueue_struct * queue, struct work_struct *work);
int queue_delayed_work(struct workqueue_struct *queue,
struct delayed_work * work , unsigned long delay);

wait for all work item to finish：

void flush_workqueue(struct worksqueue_struct * queue);

destroy the workqueue：

void destroy_workqueue(structure workqueque_struct *queue);

The next sequence declares and initializes an additional workqueue, declares and initializes a work item and adds it to the queue:

void my_work_handler(struct work_struct *work);

struct work_struct my_work;
struct workqueue_struct * my_workqueue;

my_workqueue = create_singlethread_workqueue("my_workqueue");
INIT_WORK(&my_work, my_work_handler);
queue_work(my_workqueue, &my_work);

And the next code sample shows how to remove the workqueue:

flush_workqueue(my_workqueue);
destroy_workqueue(my_workqueue);

6.3.4.Example:

  drivers/mfd/da903x.c:
  509 INIT_WORK(&chip->irq_work, da903x_irq_work);
  
  523 ret = devm_request_irq(&client->dev, client->irq, da903x_irq_handler, IRQF_TRIGGER_FALLING,"da903x", chip); 
 
  414 static irqreturn_t da903x_irq_handler(int irq, void *data)                                             
  415 {           
  416     struct da903x_chip *chip = data;
  417     
  418     disable_irq_nosync(irq);  
  419     (void)schedule_work(&chip->irq_work);
  420         
  421     return IRQ_HANDLED;
  422 }
  
  394 static void da903x_irq_work(struct work_struct *work)
  395 {
  396     struct da903x_chip *chip =
  397         container_of(work, struct da903x_chip, irq_work);
  398     unsigned int events = 0;
  399 
  400     while (1) {
  401         if (chip->ops->read_events(chip, &events))
  402             break;
  403 
  404         events &= ~chip->events_mask;
  405         if (events == 0)
  406             break;
  407 
  408         blocking_notifier_call_chain(
  409                 &chip->notifier_list, events, NULL);                                                   
  410     }
  411     enable_irq(chip->client->irq);
  412 }