Driver Basics
Driver Entry and Exit points
Atomic and pointer manipulation
-
int
-
read atomic variable
atomic_read
(const atomic_t *
v
)
Parameters
- pointer of type atomic_t
const atomic_t * v
Description
Atomically reads the value of v.
-
void
-
set atomic variable
atomic_set
(atomic_t *
v, int
i
)
Parameters
- pointer of type atomic_t
- required value
atomic_t * v
int i
Description
Atomically sets the value of v to i.
-
void
-
add integer to atomic variable
atomic_add
(int
i, atomic_t *
v
)
Parameters
- integer value to add
- pointer of type atomic_t
int i
atomic_t * v
Description
Atomically adds i to v.
-
void
-
subtract integer from atomic variable
atomic_sub
(int
i, atomic_t *
v
)
Parameters
- integer value to subtract
- pointer of type atomic_t
int i
atomic_t * v
Description
Atomically subtracts i from v.
-
bool
-
subtract value from variable and test result
atomic_sub_and_test
(int
i, atomic_t *
v
)
Parameters
- integer value to subtract
- pointer of type atomic_t
int i
atomic_t * v
Description
Atomically subtracts i from v and returns true if the result is zero, or false for all other cases.
-
void
-
increment atomic variable
atomic_inc
(atomic_t *
v
)
Parameters
- pointer of type atomic_t
atomic_t * v
Description
Atomically increments v by 1.
-
void
-
decrement atomic variable
atomic_dec
(atomic_t *
v
)
Parameters
- pointer of type atomic_t
atomic_t * v
Description
Atomically decrements v by 1.
-
bool
-
decrement and test
atomic_dec_and_test
(atomic_t *
v
)
Parameters
- pointer of type atomic_t
atomic_t * v
Description
Atomically decrements v by 1 and returns true if the result is 0, or false for all other cases.
-
bool
-
increment and test
atomic_inc_and_test
(atomic_t *
v
)
Parameters
- pointer of type atomic_t
atomic_t * v
Description
Atomically increments v by 1 and returns true if the result is zero, or false for all other cases.
-
bool
-
add and test if negative
atomic_add_negative
(int
i, atomic_t *
v
)
Parameters
- integer value to add
- pointer of type atomic_t
int i
atomic_t * v
Description
Atomically adds i to v and returns true if the result is negative, or false when result is greater than or equal to zero.
-
int
-
add integer and return
atomic_add_return
(int
i, atomic_t *
v
)
Parameters
- integer value to add
- pointer of type atomic_t
int i
atomic_t * v
Description
Atomically adds i to v and returns i + v
-
int
-
subtract integer and return
atomic_sub_return
(int
i, atomic_t *
v
)
Parameters
- integer value to subtract
- pointer of type atomic_t
int i
atomic_t * v
Description
Atomically subtracts i from v and returns v - i
-
int
-
add unless the number is already a given value
__atomic_add_unless
(atomic_t *
v, int
a, int
u
)
Parameters
- pointer of type atomic_t
- the amount to add to v...
- ...unless v is equal to u.
atomic_t * v
int a
int u
Description
Atomically adds a to v, so long as v was not already u. Returns the old value of v.
-
short int
-
increment of a short integer
atomic_inc_short
(short int *
v
)
Parameters
- pointer to type int
short int * v
Description
Atomically adds 1 to v Returns the new value of u
Delaying, scheduling, and timer routines
-
struct
-
snaphsot of system and user cputime
prev_cputime
Definition
struct prev_cputime {
#ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
cputime_t utime;
cputime_t stime;
raw_spinlock_t lock;
#endif
};
Members
- time spent in user mode
- time spent in system mode
- protects the above two fields
utime
stime
lock
Description
Stores previous user/system time values such that we can guarantee monotonicity.
-
struct
-
collected CPU time counts
task_cputime
Definition
struct task_cputime {
cputime_t utime;
cputime_t stime;
unsigned long long sum_exec_runtime;
};
Members
-
time spent in user mode, in
cputime_t
units -
time spent in kernel mode, in
cputime_t
units - total time spent on the CPU, in nanoseconds
utime
stime
sum_exec_runtime
Description
This structure groups together three kinds of CPU time that are tracked for threads and thread groups. Most things considering CPU time want to group these counts together and treat all three of them in parallel.
-
struct
-
thread group interval timer counts
thread_group_cputimer
Definition
struct thread_group_cputimer {
struct task_cputime_atomic cputime_atomic;
bool running;
bool checking_timer;
};
Members
- atomic thread group interval timers.
- true when there are timers running and cputime_atomic receives updates.
- true when a thread in the group is in the process of checking for thread group timers.
cputime_atomic
running
checking_timer
Description
This structure contains the version of task_cputime, above, that is used for thread group CPU timer calculations.
-
int
-
check that a task structure is not stale
pid_alive
(const struct task_struct *
p
)
Parameters
- Task structure to be checked.
const struct task_struct * p
Description
Test if a process is not yet dead (at most zombie state) If pid_alive fails, then pointers within the task structure can be stale and must not be dereferenced.
Return
1 if the process is alive. 0 otherwise.
-
int
-
check if a task structure is init. Since init is free to have sub-threads we need to check tgid.
is_global_init
(struct task_struct *
tsk
)
Parameters
- Task structure to be checked.
struct task_struct * tsk
Description
Check if a task structure is the first user space task the kernel created.
Return
1 if the task structure is init. 0 otherwise.
-
int
-
return the nice value of a given task.
task_nice
(const struct task_struct *
p
)
Parameters
- the task in question.
const struct task_struct * p
Return
The nice value [ -20 ... 0 ... 19 ].
-
bool
-
is the specified task an idle task?
is_idle_task
(const struct task_struct *
p
)
Parameters
- the task in question.
const struct task_struct * p
Return
1 if p is an idle task. 0 otherwise.
-
void
-
make sure saved_sigmask processing gets done
set_restore_sigmask
(void
)
Parameters
- no arguments
void
Description
This sets TIF_RESTORE_SIGMASK and ensures that the arch signal code will run before returning to user mode, to process the flag. For all callers, TIF_SIGPENDING is already set or it’s no harm to set it. TIF_RESTORE_SIGMASK need not be in the set of bits that the arch code will notice on return to user mode, in case those bits are scarce. We set TIF_SIGPENDING here to ensure that the arch signal code always gets run when TIF_RESTORE_SIGMASK is set.
-
void
-
mark the beginning of changes to a threadgroup
threadgroup_change_begin
(struct task_struct *
tsk
)
Parameters
- task causing the changes
struct task_struct * tsk
Description
All operations which modify a threadgroup - a new thread joining the group, death of a member thread (the assertion of PF_EXITING) and exec(2) dethreading the process and replacing the leader - are wrapped by threadgroup_change_{begin|end}(). This is to provide a place which subsystems needing threadgroup stability can hook into for synchronization.
-
void
-
mark the end of changes to a threadgroup
threadgroup_change_end
(struct task_struct *
tsk
)
Parameters
- task causing the changes
struct task_struct * tsk
Description
See threadgroup_change_begin()
.
-
int
-
Wake up a specific process
wake_up_process
(struct task_struct *
p
)
Parameters
- The process to be woken up.
struct task_struct * p
Description
Attempt to wake up the nominated process and move it to the set of runnable processes.
Return
1 if the process was woken up, 0 if it was already running.
It may be assumed that this function implies a write memory barrier before changing the task state if and only if any tasks are woken up.
-
void
-
tell me when current is being preempted & rescheduled
preempt_notifier_register
(struct preempt_notifier *
notifier
)
Parameters
- notifier struct to register
struct preempt_notifier * notifier
-
void
-
no longer interested in preemption notifications
preempt_notifier_unregister
(struct preempt_notifier *
notifier
)
Parameters
- notifier struct to unregister
struct preempt_notifier * notifier
Description
This is not safe to call from within a preemption notifier.
-
__visible void __sched notrace
-
preempt_schedule called by tracing
preempt_schedule_notrace
(void
)
Parameters
- no arguments
void
Description
The tracing infrastructure uses preempt_enable_notrace to prevent recursion and tracing preempt enabling caused by the tracing infrastructure itself. But as tracing can happen in areas coming from userspace or just about to enter userspace, a preempt enable can occur before user_exit()
is called. This will cause the scheduler to be called when the system is still in usermode.
To prevent this, the preempt_enable_notrace will use this function instead of preempt_schedule()
to exit user context if needed before calling the scheduler.
-
int
-
change the scheduling policy and/or RT priority of a thread.
sched_setscheduler
(struct task_struct *
p, int
policy, const struct sched_param *
param
)
Parameters
- the task in question.
- new policy.
- structure containing the new RT priority.
struct task_struct * p
int policy
const struct sched_param * param
Return
0 on success. An error code otherwise.
NOTE that the task may be already dead.
-
int
-
change the scheduling policy and/or RT priority of a thread from kernelspace.
sched_setscheduler_nocheck
(struct task_struct *
p, int
policy, const struct sched_param *
param
)
Parameters
- the task in question.
- new policy.
- structure containing the new RT priority.
struct task_struct * p
int policy
const struct sched_param * param
Description
Just like sched_setscheduler, only don’t bother checking if the current context has permission. For example, this is needed in stop_machine()
: we create temporary high priority worker threads, but our caller might not have that capability.
Return
0 on success. An error code otherwise.
-
void __sched
-
yield the current processor to other threads.
yield
(void
)
Parameters
- no arguments
void
Description
Do not ever use this function, there’s a 99% chance you’re doing it wrong.
The scheduler is at all times free to pick the calling task as the most eligible task to run, if removing the yield()
call from your code breaks it, its already broken.
Typical broken usage is:
-
while (!event)
-
yield()
;
where one assumes that yield()
will let ‘the other’ process run that will make event true. If the current task is a SCHED_FIFO task that will never happen. Never use yield()
as a progress guarantee!!
If you want to use yield()
to wait for something, use wait_event()
. If you want to use yield()
to be ‘nice’ for others, use cond_resched()
. If you still want to use yield()
, do not!
-
int __sched
-
yield the current processor to another thread in your thread group, or accelerate that thread toward the processor it’s on.
yield_to
(struct task_struct *
p, bool
preempt
)
Parameters
- target task
- whether task preemption is allowed or not
struct task_struct * p
bool preempt
Description
It’s the caller’s job to ensure that the target task struct can’t go away on us before we can do any checks.
Return
true (>0) if we indeed boosted the target task. false (0) if we failed to boost the target. -ESRCH if there’s no task to yield to.
-
int
-
find the best (lowest-pri) CPU in the system
cpupri_find
(struct cpupri *
cp, struct task_struct *
p, struct cpumask *
lowest_mask
)
Parameters
- The cpupri context
- The task
- A mask to fill in with selected CPUs (or NULL)
struct cpupri * cp
struct task_struct * p
struct cpumask * lowest_mask
Note
This function returns the recommended CPUs as calculated during the current invocation. By the time the call returns, the CPUs may have in fact changed priorities any number of times. While not ideal, it is not an issue of correctness since the normal rebalancer logic will correct any discrepancies created by racing against the uncertainty of the current priority configuration.
Return
(int)bool - CPUs were found
-
void
-
update the cpu priority setting
cpupri_set
(struct cpupri *
cp, int
cpu, int
newpri
)
Parameters
- The cpupri context
- The target cpu
- The priority (INVALID-RT99) to assign to this CPU
struct cpupri * cp
int cpu
int newpri
Note
Assumes cpu_rq(cpu)->lock is locked
Return
(void)
-
int
-
initialize the cpupri structure
cpupri_init
(struct cpupri *
cp
)
Parameters
- The cpupri context
struct cpupri * cp
Return
-ENOMEM on memory allocation failure.
-
void
-
clean up the cpupri structure
cpupri_cleanup
(struct cpupri *
cp
)
Parameters
- The cpupri context
struct cpupri * cp
-
void
-
update the tg’s load avg
update_tg_load_avg
(struct cfs_rq *
cfs_rq, int
force
)
Parameters
- the cfs_rq whose avg changed
- update regardless of how small the difference
struct cfs_rq * cfs_rq
int force
Description
This function ‘ensures’: tg->load_avg := Sum tg->cfs_rq[]->avg.load. However, because tg->load_avg is a global value there are performance considerations.
In order to avoid having to look at the other cfs_rq’s, we use a differential update where we store the last value we propagated. This in turn allows skipping updates if the differential is ‘small’.
Updating tg’s load_avg is necessary before update_cfs_share()
(which is done) and effective_load()
(which is not done because it is too costly).
-
int
-
update the cfs_rq’s load/util averages
update_cfs_rq_load_avg
(u64
now, struct cfs_rq *
cfs_rq, bool
update_freq
)
Parameters
-
current time, as per
cfs_rq_clock_task()
- cfs_rq to update
-
should we call
cfs_rq_util_change()
or will the call do so
u64 now
struct cfs_rq * cfs_rq
bool update_freq
Description
The cfs_rq avg is the direct sum of all its entities (blocked and runnable) avg. The immediate corollary is that all (fair) tasks must be attached, seepost_init_entity_util_avg()
.
cfs_rq->avg is used for task_h_load()
and update_cfs_share()
for example.
Returns true if the load decayed or we removed load.
Since both these conditions indicate a changed cfs_rq->avg.load we should call update_tg_load_avg()
when this function returns true.
-
void
-
attach this entity to its cfs_rq load avg
attach_entity_load_avg
(struct cfs_rq *
cfs_rq, struct sched_entity *
se
)
Parameters
- cfs_rq to attach to
- sched_entity to attach
struct cfs_rq * cfs_rq
struct sched_entity * se
Description
Must call update_cfs_rq_load_avg()
before this, since we rely on cfs_rq->avg.last_update_time being current.
-
void
-
detach this entity from its cfs_rq load avg
detach_entity_load_avg
(struct cfs_rq *
cfs_rq, struct sched_entity *
se
)
Parameters
- cfs_rq to detach from
- sched_entity to detach
struct cfs_rq * cfs_rq
struct sched_entity * se
Description
Must call update_cfs_rq_load_avg()
before this, since we rely on cfs_rq->avg.last_update_time being current.
-
void
-
update the rq->cpu_load[] statistics
cpu_load_update
(struct rq *
this_rq, unsigned long
this_load, unsigned long
pending_updates
)
Parameters
- The rq to update statistics for
- The current load
- The number of missed updates
struct rq * this_rq
unsigned long this_load
unsigned long pending_updates
Description
Update rq->cpu_load[] statistics. This function is usually called every scheduler tick (TICK_NSEC).
This function computes a decaying average:
load[i]’ = (1 - 1/2^i) * load[i] + (1/2^i) * load
Because of NOHZ it might not get called on every tick which gives need for the pending_updates argument.
load[i]_n = (1 - 1/2^i) * load[i]_n-1 + (1/2^i) * load_n-1
- = A * load[i]_n-1 + B ; A := (1 - 1/2^i), B := (1/2^i) * load = A * (A * load[i]_n-2 + B) + B = A * (A * (A * load[i]_n-3 + B) + B) + B = A^3 * load[i]_n-3 + (A^2 + A + 1) * B = A^n * load[i]_0 + (A^(n-1) + A^(n-2) + ... + 1) * B = A^n * load[i]_0 + ((1 - A^n) / (1 - A)) * B = (1 - 1/2^i)^n * (load[i]_0 - load) + load
In the above we’ve assumed load_n := load, which is true for NOHZ_FULL as any change in load would have resulted in the tick being turned back on.
For regular NOHZ, this reduces to:
load[i]_n = (1 - 1/2^i)^n * load[i]_0
see decay_load_misses()
. For NOHZ_FULL we get to subtract and add the extra term.
-
int
-
Obtain the load index for a given sched domain.
get_sd_load_idx
(struct sched_domain *
sd, enum cpu_idle_type
idle
)
Parameters
- The sched_domain whose load_idx is to be obtained.
- The idle status of the CPU for whose sd load_idx is obtained.
struct sched_domain * sd
enum cpu_idle_type idle
Return
The load index.
-
void
-
Update sched_group’s statistics for load balancing.
update_sg_lb_stats
(struct lb_env *
env, struct sched_group *
group, int
load_idx, int
local_group, struct sg_lb_stats *
sgs, bool *
overload
)
Parameters
- The load balancing environment.
- sched_group whose statistics are to be updated.
- Load index of sched_domain of this_cpu for load calc.
- Does group contain this_cpu.
- variable to hold the statistics for this group.
- Indicate more than one runnable task for any CPU.
struct lb_env * env
struct sched_group * group
int load_idx
int local_group
struct sg_lb_stats * sgs
bool * overload
-
bool
-
return 1 on busiest group
update_sd_pick_busiest
(struct lb_env *
env, struct sd_lb_stats *
sds, struct sched_group *
sg, struct sg_lb_stats *
sgs
)
Parameters
- The load balancing environment.
- sched_domain statistics
- sched_group candidate to be checked for being the busiest
- sched_group statistics
struct lb_env * env
struct sd_lb_stats * sds
struct sched_group * sg
struct sg_lb_stats * sgs
Description
Determine if sg is a busier group than the previously selected busiest group.
Return
true
if sg is a busier group than the previously selected busiest group. false
otherwise.
-
void
-
Update sched_domain’s statistics for load balancing.
update_sd_lb_stats
(struct lb_env *
env, struct sd_lb_stats *
sds
)
Parameters
- The load balancing environment.
- variable to hold the statistics for this sched_domain.
struct lb_env * env
struct sd_lb_stats * sds
-
int
-
Check to see if the group is packed into the sched doman.
check_asym_packing
(struct lb_env *
env, struct sd_lb_stats *
sds
)
Parameters
- The load balancing environment.
- Statistics of the sched_domain which is to be packed
struct lb_env * env
struct sd_lb_stats * sds
Description
This is primarily intended to used at the sibling level. Some cores like POWER7 prefer to use lower numbered SMT threads. In the case of POWER7, it can move to lower SMT modes only when higher threads are idle. When in lower SMT modes, the threads will perform better since they share less core resources. Hence when we have idle threads, we want them to be the higher ones.
This packing function is run on idle threads. It checks to see if the busiest CPU in this domain (core in the P7 case) has a higher CPU number than the packing function is being run on. Here we are assuming lower CPU number will be equivalent to lower a SMT thread number.
Return
1 when packing is required and a task should be moved to this CPU. The amount of the imbalance is returned in *imbalance.
-
void
-
Calculate the minor imbalance that exists amongst the groups of a sched_domain, during load balancing.
fix_small_imbalance
(struct lb_env *
env, struct sd_lb_stats *
sds
)
Parameters
- The load balancing environment.
- Statistics of the sched_domain whose imbalance is to be calculated.
struct lb_env * env
struct sd_lb_stats * sds
-
void
-
Calculate the amount of imbalance present within the groups of a given sched_domain during load balance.
calculate_imbalance
(struct lb_env *
env, struct sd_lb_stats *
sds
)
Parameters
- load balance environment
- statistics of the sched_domain whose imbalance is to be calculated.
struct lb_env * env
struct sd_lb_stats * sds
-
struct sched_group *
-
Returns the busiest group within the sched_domain if there is an imbalance.
find_busiest_group
(struct lb_env *
env
)
Parameters
- The load balancing environment.
struct lb_env * env
Description
Also calculates the amount of weighted load which should be moved to restore balance.
Return
- The busiest group if imbalance exists.
-
declare and initialize a completion structure
DECLARE_COMPLETION
(
work
)
Parameters
- identifier for the completion structure
work
Description
This macro declares and initializes a completion structure. Generally used for static declarations. You should use the _ONSTACK variant for automatic variables.
-
declare and initialize a completion structure
DECLARE_COMPLETION_ONSTACK
(
work
)
Parameters
- identifier for the completion structure
work
Description
This macro declares and initializes a completion structure on the kernel stack.
-
void
-
Initialize a dynamically allocated completion
init_completion
(struct completion *
x
)
Parameters
- pointer to completion structure that is to be initialized
struct completion * x
Description
This inline function will initialize a dynamically created completion structure.
-
void
-
reinitialize a completion structure
reinit_completion
(struct completion *
x
)
Parameters
- pointer to completion structure that is to be reinitialized
struct completion * x
Description
This inline function should be used to reinitialize a completion structure so it can be reused. This is especially important after complete_all()
is used.
-
unsigned long
-
function to round jiffies to a full second
__round_jiffies
(unsigned long
j, int
cpu
)
Parameters
- the time in (absolute) jiffies that should be rounded
- the processor number on which the timeout will happen
unsigned long j
int cpu
Description
__round_jiffies()
rounds an absolute time in the future (in jiffies) up or down to (approximately) full seconds. This is useful for timers for which the exact time they fire does not matter too much, as long as they fire approximately every X seconds.
By rounding these timers to whole seconds, all such timers will fire at the same time, rather than at various times spread out. The goal of this is to have the CPU wake up less, which saves power.
The exact rounding is skewed for each processor to avoid all processors firing at the exact same time, which could lead to lock contention or spurious cache line bouncing.
The return value is the rounded version of the j parameter.
-
unsigned long
-
function to round jiffies to a full second
__round_jiffies_relative
(unsigned long
j, int
cpu
)
Parameters
- the time in (relative) jiffies that should be rounded
- the processor number on which the timeout will happen
unsigned long j
int cpu
Description
__round_jiffies_relative()
rounds a time delta in the future (in jiffies) up or down to (approximately) full seconds. This is useful for timers for which the exact time they fire does not matter too much, as long as they fire approximately every X seconds.
By rounding these timers to whole seconds, all such timers will fire at the same time, rather than at various times spread out. The goal of this is to have the CPU wake up less, which saves power.
The exact rounding is skewed for each processor to avoid all processors firing at the exact same time, which could lead to lock contention or spurious cache line bouncing.
The return value is the rounded version of the j parameter.
-
unsigned long
-
function to round jiffies to a full second
round_jiffies
(unsigned long
j
)
Parameters
- the time in (absolute) jiffies that should be rounded
unsigned long j
Description
round_jiffies()
rounds an absolute time in the future (in jiffies) up or down to (approximately) full seconds. This is useful for timers for which the exact time they fire does not matter too much, as long as they fire approximately every X seconds.
By rounding these timers to whole seconds, all such timers will fire at the same time, rather than at various times spread out. The goal of this is to have the CPU wake up less, which saves power.
The return value is the rounded version of the j parameter.
-
unsigned long
-
function to round jiffies to a full second
round_jiffies_relative
(unsigned long
j
)
Parameters
- the time in (relative) jiffies that should be rounded
unsigned long j
Description
round_jiffies_relative()
rounds a time delta in the future (in jiffies) up or down to (approximately) full seconds. This is useful for timers for which the exact time they fire does not matter too much, as long as they fire approximately every X seconds.
By rounding these timers to whole seconds, all such timers will fire at the same time, rather than at various times spread out. The goal of this is to have the CPU wake up less, which saves power.
The return value is the rounded version of the j parameter.
-
unsigned long
-
function to round jiffies up to a full second
__round_jiffies_up
(unsigned long
j, int
cpu
)
Parameters
- the time in (absolute) jiffies that should be rounded
- the processor number on which the timeout will happen
unsigned long j
int cpu
Description
This is the same as __round_jiffies()
except that it will never round down. This is useful for timeouts for which the exact time of firing does not matter too much, as long as they don’t fire too early.
-
unsigned long
-
function to round jiffies up to a full second
__round_jiffies_up_relative
(unsigned long
j, int
cpu
)
Parameters
- the time in (relative) jiffies that should be rounded
- the processor number on which the timeout will happen
unsigned long j
int cpu
Description
This is the same as __round_jiffies_relative()
except that it will never round down. This is useful for timeouts for which the exact time of firing does not matter too much, as long as they don’t fire too early.
-
unsigned long
-
function to round jiffies up to a full second
round_jiffies_up
(unsigned long
j
)
Parameters
- the time in (absolute) jiffies that should be rounded
unsigned long j
Description
This is the same as round_jiffies()
except that it will never round down. This is useful for timeouts for which the exact time of firing does not matter too much, as long as they don’t fire too early.
-
unsigned long
-
function to round jiffies up to a full second
round_jiffies_up_relative
(unsigned long
j
)
Parameters
- the time in (relative) jiffies that should be rounded
unsigned long j
Description
This is the same as round_jiffies_relative()
except that it will never round down. This is useful for timeouts for which the exact time of firing does not matter too much, as long as they don’t fire too early.
-
void
-
initialize a timer
init_timer_key
(struct timer_list *
timer, unsigned int
flags, const char *
name, struct lock_class_key *
key
)
Parameters
- the timer to be initialized
- timer flags
- name of the timer
- lockdep class key of the fake lock used for tracking timer sync lock dependencies
struct timer_list * timer
unsigned int flags
const char * name
struct lock_class_key * key
Description
init_timer_key()
must be done to a timer prior calling any of the other timer functions.
-
int
-
modify a pending timer’s timeout
mod_timer_pending
(struct timer_list *
timer, unsigned long
expires
)
Parameters
- the pending timer to be modified
- new timeout in jiffies
struct timer_list * timer
unsigned long expires
Description
mod_timer_pending()
is the same for pending timers as mod_timer()
, but will not re-activate and modify already deleted timers.
It is useful for unserialized use of timers.
-
int
-
modify a timer’s timeout
mod_timer
(struct timer_list *
timer, unsigned long
expires
)
Parameters
- the timer to be modified
- new timeout in jiffies
struct timer_list * timer
unsigned long expires
Description
mod_timer()
is a more efficient way to update the expire field of an active timer (if the timer is inactive it will be activated)
mod_timer(timer, expires) is equivalent to:
del_timer(timer); timer->expires = expires; add_timer(timer);
Note that if there are multiple unserialized concurrent users of the same timer, then mod_timer()
is the only safe way to modify the timeout, sinceadd_timer()
cannot modify an already running timer.
The function returns whether it has modified a pending timer or not. (ie. mod_timer()
of an inactive timer returns 0, mod_timer()
of an active timer returns 1.)
-
void
-
start a timer
add_timer
(struct timer_list *
timer
)
Parameters
- the timer to be added
struct timer_list * timer
Description
The kernel will do a ->function(->data) callback from the timer interrupt at the ->expires point in the future. The current time is ‘jiffies’.
The timer’s ->expires, ->function (and if the handler uses it, ->data) fields must be set prior calling this function.
Timers with an ->expires field in the past will be executed in the next timer tick.
-
void
-
start a timer on a particular CPU
add_timer_on
(struct timer_list *
timer, int
cpu
)
Parameters
- the timer to be added
- the CPU to start it on
struct timer_list * timer
int cpu
Description
This is not very scalable on SMP. Double adds are not possible.
-
int
-
deactive a timer.
del_timer
(struct timer_list *
timer
)
Parameters
- the timer to be deactivated
struct timer_list * timer
Description
del_timer()
deactivates a timer - this works on both active and inactive timers.
The function returns whether it has deactivated a pending timer or not. (ie. del_timer()
of an inactive timer returns 0, del_timer()
of an active timer returns 1.)
-
int
-
Try to deactivate a timer
try_to_del_timer_sync
(struct timer_list *
timer
)
Parameters
- timer do del
struct timer_list * timer
Description
This function tries to deactivate a timer. Upon successful (ret >= 0) exit the timer is not queued and the handler is not running on any CPU.
-
int
-
deactivate a timer and wait for the handler to finish.
del_timer_sync
(struct timer_list *
timer
)
Parameters
- the timer to be deactivated
struct timer_list * timer
Description
This function only differs from del_timer()
on SMP: besides deactivating the timer it also makes sure the handler has finished executing on other CPUs.
Synchronization rules: Callers must prevent restarting of the timer, otherwise this function is meaningless. It must not be called from interrupt contexts unless the timer is an irqsafe one. The caller must not hold locks which would prevent completion of the timer’s handler. The timer’s handler must not call add_timer_on()
. Upon exit the timer is not queued and the handler is not running on any CPU.
Note
-
For !irqsafe timers, you must not hold locks that are held in
-
interrupt context while calling this function. Even if the lock has nothing to do with the timer in question. Here’s why:
CPU0 CPU1 —- —-
<SOFTIRQ>
call_timer_fn()
;base->running_timer = mytimer;-
spin_lock_irq(somelock);
-
-
<IRQ>
- spin_lock(somelock);
del_timer_sync(mytimer);
- while (base->running_timer == mytimer);
-
Now del_timer_sync()
will never return and never release somelock. The interrupt on the other CPU is waiting to grab somelock but it has interrupted the softirq that CPU0 is waiting to finish.
The function returns whether it has deactivated a pending timer or not.
-
signed long __sched
-
sleep until timeout
schedule_timeout
(signed long
timeout
)
Parameters
- timeout value in jiffies
signed long timeout
Description
Make the current task sleep until timeout jiffies have elapsed. The routine will return immediately unless the current task state has been set (seeset_current_state()
).
You can set the task state as follows -
TASK_UNINTERRUPTIBLE
- at least timeout jiffies are guaranteed to pass before the routine returns unless the current task is explicitly woken up, (e.g. by wake_up_process()
)”.
TASK_INTERRUPTIBLE
- the routine may return early if a signal is delivered to the current task or the current task is explicitly woken up.
The current task state is guaranteed to be TASK_RUNNING when this routine returns.
Specifying a timeout value of MAX_SCHEDULE_TIMEOUT
will schedule the CPU away without a bound on the timeout. In this case the return value will be MAX_SCHEDULE_TIMEOUT
.
Returns 0 when the timer has expired otherwise the remaining time in jiffies will be returned. In all cases the return value is guaranteed to be non-negative.
-
void
-
sleep safely even with waitqueue interruptions
msleep
(unsigned int
msecs
)
Parameters
- Time in milliseconds to sleep for
unsigned int msecs
-
unsigned long
-
sleep waiting for signals
msleep_interruptible
(unsigned int
msecs
)
Parameters
- Time in milliseconds to sleep for
unsigned int msecs
-
void __sched
-
Sleep for an approximate time
usleep_range
(unsigned long
min, unsigned long
max
)
Parameters
- Minimum time in usecs to sleep
- Maximum time in usecs to sleep
unsigned long min
unsigned long max
Description
In non-atomic context where the exact wakeup time is flexible, use usleep_range()
instead of udelay()
. The sleep improves responsiveness by avoiding the CPU-hogging busy-wait of udelay()
, and the range reduces power usage by allowing hrtimers to take advantage of an already- scheduled interrupt instead of scheduling a new one just for this sleep.
Wait queues and Wake events
-
int
-
- locklessly test for waiters on the queue
waitqueue_active
(wait_queue_head_t *
q
)
Parameters
- the waitqueue to test for waiters
wait_queue_head_t * q
Description
returns true if the wait list is not empty
NOTE
this function is lockless and requires care, incorrect usage _will_ lead to sporadic and non-obvious failure.
Use either while holding wait_queue_head_t::lock or when used for wakeups with an extra smp_mb()
like:
CPU0 - waker CPU1 - waiter
for (;;) {cond = true; prepare_to_wait(
wq
,wait
, state);smp_mb()
; //smp_mb()
fromset_current_state()
if (waitqueue_active(wq)) if (cond)wake_up(wq); break;
schedule()
;} finish_wait(
wq
,wait
);
Because without the explicit smp_mb()
it’s possible for the waitqueue_active()
load to get hoisted over the cond store such that we’ll observe an empty wait list while the waiter might not observe cond.
Also note that this ‘optimization’ trades a spin_lock()
for an smp_mb()
, which (when the lock is uncontended) are of roughly equal cost.
-
bool
-
check if there are any waiting processes
wq_has_sleeper
(wait_queue_head_t *
wq
)
Parameters
- wait queue head
wait_queue_head_t * wq
Description
Returns true if wq has waiting processes
Please refer to the comment for waitqueue_active.
-
sleep until a condition gets true
wait_event
(
wq,
condition
)
Parameters
- the waitqueue to wait on
- a C expression for the event to wait for
wq
condition
Description
The process is put to sleep (TASK_UNINTERRUPTIBLE) until the condition evaluates to true. The condition is checked each time the waitqueuewq is woken up.
wake_up()
has to be called after changing any variable that could change the result of the wait condition.
-
sleep (or freeze) until a condition gets true
wait_event_freezable
(
wq,
condition
)
Parameters
- the waitqueue to wait on
- a C expression for the event to wait for
wq
condition
Description
The process is put to sleep (TASK_INTERRUPTIBLE – so as not to contribute to system load) until the condition evaluates to true. The conditionis checked each time the waitqueue wq is woken up.
wake_up()
has to be called after changing any variable that could change the result of the wait condition.
-
sleep until a condition gets true or a timeout elapses
wait_event_timeout
(
wq,
condition,
timeout
)
Parameters
- the waitqueue to wait on
- a C expression for the event to wait for
- timeout, in jiffies
wq
condition
timeout
Description
The process is put to sleep (TASK_UNINTERRUPTIBLE) until the condition evaluates to true. The condition is checked each time the waitqueuewq is woken up.
wake_up()
has to be called after changing any variable that could change the result of the wait condition.
Return
0 if the condition evaluated to false
after the timeout elapsed, 1 if the condition evaluated to true
after the timeout elapsed, or the remaining jiffies (at least 1) if the condition evaluated to true
before the timeout elapsed.
-
sleep until a condition gets true
wait_event_cmd
(
wq,
condition,
cmd1,
cmd2
)
Parameters
- the waitqueue to wait on
- a C expression for the event to wait for
- the command will be executed before sleep
- the command will be executed after sleep
wq
condition
cmd1
cmd2
Description
The process is put to sleep (TASK_UNINTERRUPTIBLE) until the condition evaluates to true. The condition is checked each time the waitqueuewq is woken up.
wake_up()
has to be called after changing any variable that could change the result of the wait condition.
-
sleep until a condition gets true
wait_event_interruptible
(
wq,
condition
)
Parameters
- the waitqueue to wait on
- a C expression for the event to wait for
wq
condition
Description
The process is put to sleep (TASK_INTERRUPTIBLE) until the condition evaluates to true or a signal is received. The condition is checked each time the waitqueue wq is woken up.
wake_up()
has to be called after changing any variable that could change the result of the wait condition.
The function will return -ERESTARTSYS if it was interrupted by a signal and 0 if condition evaluated to true.
-
sleep until a condition gets true or a timeout elapses
wait_event_interruptible_timeout
(
wq,
condition,
timeout
)
Parameters
- the waitqueue to wait on
- a C expression for the event to wait for
- timeout, in jiffies
wq
condition
timeout
Description
The process is put to sleep (TASK_INTERRUPTIBLE) until the condition evaluates to true or a signal is received. The condition is checked each time the waitqueue wq is woken up.
wake_up()
has to be called after changing any variable that could change the result of the wait condition.
Return
0 if the condition evaluated to false
after the timeout elapsed, 1 if the condition evaluated to true
after the timeout elapsed, the remaining jiffies (at least 1) if the condition evaluated to true
before the timeout elapsed, or -ERESTARTSYS
if it was interrupted by a signal.
-
sleep until a condition gets true or a timeout elapses
wait_event_hrtimeout
(
wq,
condition,
timeout
)
Parameters
- the waitqueue to wait on
- a C expression for the event to wait for
- timeout, as a ktime_t
wq
condition
timeout
Description
The process is put to sleep (TASK_UNINTERRUPTIBLE) until the condition evaluates to true or a signal is received. The condition is checked each time the waitqueue wq is woken up.
wake_up()
has to be called after changing any variable that could change the result of the wait condition.
The function returns 0 if condition became true, or -ETIME if the timeout elapsed.
-
sleep until a condition gets true or a timeout elapses
wait_event_interruptible_hrtimeout
(
wq,
condition,
timeout
)
Parameters
- the waitqueue to wait on
- a C expression for the event to wait for
- timeout, as a ktime_t
wq
condition
timeout
Description
The process is put to sleep (TASK_INTERRUPTIBLE) until the condition evaluates to true or a signal is received. The condition is checked each time the waitqueue wq is woken up.
wake_up()
has to be called after changing any variable that could change the result of the wait condition.
The function returns 0 if condition became true, -ERESTARTSYS if it was interrupted by a signal, or -ETIME if the timeout elapsed.
-
sleep until a condition gets true
wait_event_interruptible_locked
(
wq,
condition
)
Parameters
- the waitqueue to wait on
- a C expression for the event to wait for
wq
condition
Description
The process is put to sleep (TASK_INTERRUPTIBLE) until the condition evaluates to true or a signal is received. The condition is checked each time the waitqueue wq is woken up.
It must be called with wq.lock being held. This spinlock is unlocked while sleeping but condition testing is done while lock is held and when this macro exits the lock is held.
The lock is locked/unlocked using spin_lock()
/spin_unlock()
functions which must match the way they are locked/unlocked outside of this macro.
wake_up_locked()
has to be called after changing any variable that could change the result of the wait condition.
The function will return -ERESTARTSYS if it was interrupted by a signal and 0 if condition evaluated to true.
-
sleep until a condition gets true
wait_event_interruptible_locked_irq
(
wq,
condition
)
Parameters
- the waitqueue to wait on
- a C expression for the event to wait for
wq
condition
Description
The process is put to sleep (TASK_INTERRUPTIBLE) until the condition evaluates to true or a signal is received. The condition is checked each time the waitqueue wq is woken up.
It must be called with wq.lock being held. This spinlock is unlocked while sleeping but condition testing is done while lock is held and when this macro exits the lock is held.
The lock is locked/unlocked using spin_lock_irq()
/spin_unlock_irq()
functions which must match the way they are locked/unlocked outside of this macro.
wake_up_locked()
has to be called after changing any variable that could change the result of the wait condition.
The function will return -ERESTARTSYS if it was interrupted by a signal and 0 if condition evaluated to true.
-
sleep exclusively until a condition gets true
wait_event_interruptible_exclusive_locked
(
wq,
condition
)
Parameters
- the waitqueue to wait on
- a C expression for the event to wait for
wq
condition
Description
The process is put to sleep (TASK_INTERRUPTIBLE) until the condition evaluates to true or a signal is received. The condition is checked each time the waitqueue wq is woken up.
It must be called with wq.lock being held. This spinlock is unlocked while sleeping but condition testing is done while lock is held and when this macro exits the lock is held.
The lock is locked/unlocked using spin_lock()
/spin_unlock()
functions which must match the way they are locked/unlocked outside of this macro.
The process is put on the wait queue with an WQ_FLAG_EXCLUSIVE flag set thus when other process waits process on the list if this process is awaken further processes are not considered.
wake_up_locked()
has to be called after changing any variable that could change the result of the wait condition.
The function will return -ERESTARTSYS if it was interrupted by a signal and 0 if condition evaluated to true.
-
sleep until a condition gets true
wait_event_interruptible_exclusive_locked_irq
(
wq,
condition
)
Parameters
- the waitqueue to wait on
- a C expression for the event to wait for
wq
condition
Description
The process is put to sleep (TASK_INTERRUPTIBLE) until the condition evaluates to true or a signal is received. The condition is checked each time the waitqueue wq is woken up.
It must be called with wq.lock being held. This spinlock is unlocked while sleeping but condition testing is done while lock is held and when this macro exits the lock is held.
The lock is locked/unlocked using spin_lock_irq()
/spin_unlock_irq()
functions which must match the way they are locked/unlocked outside of this macro.
The process is put on the wait queue with an WQ_FLAG_EXCLUSIVE flag set thus when other process waits process on the list if this process is awaken further processes are not considered.
wake_up_locked()
has to be called after changing any variable that could change the result of the wait condition.
The function will return -ERESTARTSYS if it was interrupted by a signal and 0 if condition evaluated to true.
-
sleep until a condition gets true
wait_event_killable
(
wq,
condition
)
Parameters
- the waitqueue to wait on
- a C expression for the event to wait for
wq
condition
Description
The process is put to sleep (TASK_KILLABLE) until the condition evaluates to true or a signal is received. The condition is checked each time the waitqueue wq is woken up.
wake_up()
has to be called after changing any variable that could change the result of the wait condition.
The function will return -ERESTARTSYS if it was interrupted by a signal and 0 if condition evaluated to true.
-
sleep until a condition gets true. The condition is checked under the lock. This is expected to be called with the lock taken.
wait_event_lock_irq_cmd
(
wq,
condition,
lock,
cmd
)
Parameters
- the waitqueue to wait on
- a C expression for the event to wait for
-
a locked spinlock_t, which will be released before cmd and
schedule()
and reacquired afterwards. - a command which is invoked outside the critical section before sleep
wq
condition
lock
cmd
Description
The process is put to sleep (TASK_UNINTERRUPTIBLE) until the condition evaluates to true. The condition is checked each time the waitqueuewq is woken up.
wake_up()
has to be called after changing any variable that could change the result of the wait condition.
This is supposed to be called while holding the lock. The lock is dropped before invoking the cmd and going to sleep and is reacquired afterwards.
-
sleep until a condition gets true. The condition is checked under the lock. This is expected to be called with the lock taken.
wait_event_lock_irq
(
wq,
condition,
lock
)
Parameters
- the waitqueue to wait on
- a C expression for the event to wait for
-
a locked spinlock_t, which will be released before
schedule()
and reacquired afterwards.
wq
condition
lock
Description
The process is put to sleep (TASK_UNINTERRUPTIBLE) until the condition evaluates to true. The condition is checked each time the waitqueuewq is woken up.
wake_up()
has to be called after changing any variable that could change the result of the wait condition.
This is supposed to be called while holding the lock. The lock is dropped before going to sleep and is reacquired afterwards.
-
sleep until a condition gets true. The condition is checked under the lock. This is expected to be called with the lock taken.
wait_event_interruptible_lock_irq_cmd
(
wq,
condition,
lock,
cmd
)
Parameters
- the waitqueue to wait on
- a C expression for the event to wait for
-
a locked spinlock_t, which will be released before cmd and
schedule()
and reacquired afterwards. - a command which is invoked outside the critical section before sleep
wq
condition
lock
cmd
Description
The process is put to sleep (TASK_INTERRUPTIBLE) until the condition evaluates to true or a signal is received. The condition is checked each time the waitqueue wq is woken up.
wake_up()
has to be called after changing any variable that could change the result of the wait condition.
This is supposed to be called while holding the lock. The lock is dropped before invoking the cmd and going to sleep and is reacquired afterwards.
The macro will return -ERESTARTSYS if it was interrupted by a signal and 0 if condition evaluated to true.
-
sleep until a condition gets true. The condition is checked under the lock. This is expected to be called with the lock taken.
wait_event_interruptible_lock_irq
(
wq,
condition,
lock
)
Parameters
- the waitqueue to wait on
- a C expression for the event to wait for
-
a locked spinlock_t, which will be released before
schedule()
and reacquired afterwards.
wq
condition
lock
Description
The process is put to sleep (TASK_INTERRUPTIBLE) until the condition evaluates to true or signal is received. The condition is checked each time the waitqueue wq is woken up.
wake_up()
has to be called after changing any variable that could change the result of the wait condition.
This is supposed to be called while holding the lock. The lock is dropped before going to sleep and is reacquired afterwards.
The macro will return -ERESTARTSYS if it was interrupted by a signal and 0 if condition evaluated to true.
-
sleep until a condition gets true or a timeout elapses. The condition is checked under the lock. This is expected to be called with the lock taken.
wait_event_interruptible_lock_irq_timeout
(
wq,
condition,
lock,
timeout
)
Parameters
- the waitqueue to wait on
- a C expression for the event to wait for
-
a locked spinlock_t, which will be released before
schedule()
and reacquired afterwards. - timeout, in jiffies
wq
condition
lock
timeout
Description
The process is put to sleep (TASK_INTERRUPTIBLE) until the condition evaluates to true or signal is received. The condition is checked each time the waitqueue wq is woken up.
wake_up()
has to be called after changing any variable that could change the result of the wait condition.
This is supposed to be called while holding the lock. The lock is dropped before going to sleep and is reacquired afterwards.
The function returns 0 if the timeout elapsed, -ERESTARTSYS if it was interrupted by a signal, and the remaining jiffies otherwise if the condition evaluated to true before the timeout elapsed.
-
int
-
wait for a bit to be cleared
wait_on_bit
(unsigned long *
word, int
bit, unsigned
mode
)
Parameters
- the word being waited on, a kernel virtual address
- the bit of the word being waited on
- the task state to sleep in
unsigned long * word
int bit
unsigned mode
Description
There is a standard hashed waitqueue table for generic use. This is the part of the hashtable’s accessor API that waits on a bit. For instance, if one were to have waiters on a bitflag, one would call wait_on_bit()
in threads waiting for the bit to clear. One uses wait_on_bit()
where one is waiting for the bit to clear, but has no intention of setting it. Returned value will be zero if the bit was cleared, or non-zero if the process received a signal and the mode permitted wakeup on that signal.
-
int
-
wait for a bit to be cleared
wait_on_bit_io
(unsigned long *
word, int
bit, unsigned
mode
)
Parameters
- the word being waited on, a kernel virtual address
- the bit of the word being waited on
- the task state to sleep in
unsigned long * word
int bit
unsigned mode
Description
Use the standard hashed waitqueue table to wait for a bit to be cleared. This is similar to wait_on_bit()
, but calls io_schedule()
instead ofschedule()
for the actual waiting.
Returned value will be zero if the bit was cleared, or non-zero if the process received a signal and the mode permitted wakeup on that signal.
-
int
-
wait for a bit to be cleared or a timeout elapses
wait_on_bit_timeout
(unsigned long *
word, int
bit, unsigned
mode, unsigned long
timeout
)
Parameters
- the word being waited on, a kernel virtual address
- the bit of the word being waited on
- the task state to sleep in
- timeout, in jiffies
unsigned long * word
int bit
unsigned mode
unsigned long timeout
Description
Use the standard hashed waitqueue table to wait for a bit to be cleared. This is similar to wait_on_bit()
, except also takes a timeout parameter.
Returned value will be zero if the bit was cleared before the timeout elapsed, or non-zero if the timeout elapsed or process received a signal and the mode permitted wakeup on that signal.
-
int
-
wait for a bit to be cleared
wait_on_bit_action
(unsigned long *
word, int
bit, wait_bit_action_f *
action, unsigned
mode
)
Parameters
- the word being waited on, a kernel virtual address
- the bit of the word being waited on
- the function used to sleep, which may take special actions
- the task state to sleep in
unsigned long * word
int bit
wait_bit_action_f * action
unsigned mode
Description
Use the standard hashed waitqueue table to wait for a bit to be cleared, and allow the waiting action to be specified. This is like wait_on_bit()
but allows fine control of how the waiting is done.
Returned value will be zero if the bit was cleared, or non-zero if the process received a signal and the mode permitted wakeup on that signal.
-
int
-
wait for a bit to be cleared, when wanting to set it
wait_on_bit_lock
(unsigned long *
word, int
bit, unsigned
mode
)
Parameters
- the word being waited on, a kernel virtual address
- the bit of the word being waited on
- the task state to sleep in
unsigned long * word
int bit
unsigned mode
Description
There is a standard hashed waitqueue table for generic use. This is the part of the hashtable’s accessor API that waits on a bit when one intends to set it, for instance, trying to lock bitflags. For instance, if one were to have waiters trying to set bitflag and waiting for it to clear before setting it, one would call wait_on_bit()
in threads waiting to be able to set the bit. One uses wait_on_bit_lock()
where one is waiting for the bit to clear with the intention of setting it, and when done, clearing it.
Returns zero if the bit was (eventually) found to be clear and was set. Returns non-zero if a signal was delivered to the process and the modeallows that signal to wake the process.
-
int
-
wait for a bit to be cleared, when wanting to set it
wait_on_bit_lock_io
(unsigned long *
word, int
bit, unsigned
mode
)
Parameters
- the word being waited on, a kernel virtual address
- the bit of the word being waited on
- the task state to sleep in
unsigned long * word
int bit
unsigned mode
Description
Use the standard hashed waitqueue table to wait for a bit to be cleared and then to atomically set it. This is similar to wait_on_bit()
, but callsio_schedule()
instead of schedule()
for the actual waiting.
Returns zero if the bit was (eventually) found to be clear and was set. Returns non-zero if a signal was delivered to the process and the modeallows that signal to wake the process.
-
int
-
wait for a bit to be cleared, when wanting to set it
wait_on_bit_lock_action
(unsigned long *
word, int
bit, wait_bit_action_f *
action, unsigned
mode
)
Parameters
- the word being waited on, a kernel virtual address
- the bit of the word being waited on
- the function used to sleep, which may take special actions
- the task state to sleep in
unsigned long * word
int bit
wait_bit_action_f * action
unsigned mode
Description
Use the standard hashed waitqueue table to wait for a bit to be cleared and then to set it, and allow the waiting action to be specified. This is likewait_on_bit()
but allows fine control of how the waiting is done.
Returns zero if the bit was (eventually) found to be clear and was set. Returns non-zero if a signal was delivered to the process and the modeallows that signal to wake the process.
-
int
-
Wait for an atomic_t to become 0
wait_on_atomic_t
(atomic_t *
val, int (*action) (atomic_t
*, unsigned
mode
)
Parameters
- The atomic value being waited on, a kernel virtual address
- the function used to sleep, which may take special actions
- the task state to sleep in
atomic_t * val
int (*)(atomic_t *) action
unsigned mode
Description
Wait for an atomic_t to become 0. We abuse the bit-wait waitqueue table for the purpose of getting a waitqueue, but we set the key to a bit number outside of the target ‘word’.
-
void
-
wake up threads blocked on a waitqueue.
__wake_up
(wait_queue_head_t *
q, unsigned int
mode, int
nr_exclusive, void *
key
)
Parameters
- the waitqueue
- which threads
- how many wake-one or wake-many threads to wake up
- is directly passed to the wakeup function
wait_queue_head_t * q
unsigned int mode
int nr_exclusive
void * key
Description
It may be assumed that this function implies a write memory barrier before changing the task state if and only if any tasks are woken up.
-
void
-
wake up threads blocked on a waitqueue.
__wake_up_sync_key
(wait_queue_head_t *
q, unsigned int
mode, int
nr_exclusive, void *
key
)
Parameters
- the waitqueue
- which threads
- how many wake-one or wake-many threads to wake up
- opaque value to be passed to wakeup targets
wait_queue_head_t * q
unsigned int mode
int nr_exclusive
void * key
Description
The sync wakeup differs that the waker knows that it will schedule away soon, so while the target thread will be woken up, it will not be migrated to another CPU - ie. the two threads are ‘synchronized’ with each other. This can prevent needless bouncing between CPUs.
On UP it can prevent extra preemption.
It may be assumed that this function implies a write memory barrier before changing the task state if and only if any tasks are woken up.
-
void
-
clean up after waiting in a queue
finish_wait
(wait_queue_head_t *
q, wait_queue_t *
wait
)
Parameters
- waitqueue waited on
- wait descriptor
wait_queue_head_t * q
wait_queue_t * wait
Description
Sets current thread back to running state and removes the wait descriptor from the given waitqueue if still queued.
-
void
-
wake up a waiter on a bit
wake_up_bit
(void *
word, int
bit
)
Parameters
- the word being waited on, a kernel virtual address
- the bit of the word being waited on
void * word
int bit
Description
There is a standard hashed waitqueue table for generic use. This is the part of the hashtable’s accessor API that wakes up waiters on a bit. For instance, if one were to have waiters on a bitflag, one would call wake_up_bit()
after clearing the bit.
In order for this to function properly, as it uses waitqueue_active()
internally, some kind of memory barrier must be done prior to calling this. Typically, this will be smp_mb__after_atomic()
, but in some cases where bitflags are manipulated non-atomically under a lock, one may need to use a less regular barrier, such fs/inode.c’s smp_mb()
, because spin_unlock()
does not guarantee a memory barrier.
-
void
-
Wake up a waiter on a atomic_t
wake_up_atomic_t
(atomic_t *
p
)
Parameters
- The atomic_t being waited on, a kernel virtual address
atomic_t * p
Description
Wake up anyone waiting for the atomic_t to go to zero.
Abuse the bit-waker function and its waitqueue hash table set (the atomic_t check is done by the waiter’s wake function, not the by the waker itself).
High-resolution timers
-
ktime_t
-
Set a ktime_t variable from a seconds/nanoseconds value
ktime_set
(const s64
secs, const unsigned long
nsecs
)
Parameters
- seconds to set
- nanoseconds to set
const s64 secs
const unsigned long nsecs
Return
The ktime_t representation of the value.
-
int
-
Compares two ktime_t variables for less, greater or equal
ktime_compare
(const ktime_t
cmp1, const ktime_t
cmp2
)
Parameters
- comparable1
- comparable2
const ktime_t cmp1
const ktime_t cmp2
Return
-
...
- cmp1 < cmp2: return <0 cmp1 == cmp2: return 0 cmp1 > cmp2: return >0
-
bool
-
Compare if a ktime_t value is bigger than another one.
ktime_after
(const ktime_t
cmp1, const ktime_t
cmp2
)
Parameters
- comparable1
- comparable2
const ktime_t cmp1
const ktime_t cmp2
Return
true if cmp1 happened after cmp2.
-
bool
-
Compare if a ktime_t value is smaller than another one.
ktime_before
(const ktime_t
cmp1, const ktime_t
cmp2
)
Parameters
- comparable1
- comparable2
const ktime_t cmp1
const ktime_t cmp2
Return
true if cmp1 happened before cmp2.
-
bool
-
convert a ktime_t variable to timespec format only if the variable contains data
ktime_to_timespec_cond
(const ktime_t
kt, struct timespec *
ts
)
Parameters
- the ktime_t variable to convert
- the timespec variable to store the result in
const ktime_t kt
struct timespec * ts
Return
true
if there was a successful conversion, false
if kt was 0.
-
bool
-
convert a ktime_t variable to timespec64 format only if the variable contains data
ktime_to_timespec64_cond
(const ktime_t
kt, struct timespec64 *
ts
)
Parameters
- the ktime_t variable to convert
- the timespec variable to store the result in
const ktime_t kt
struct timespec64 * ts
Return
true
if there was a successful conversion, false
if kt was 0.
-
struct
-
the basic hrtimer structure
hrtimer
Definition
struct hrtimer {
struct timerqueue_node node;
ktime_t _softexpires;
enum hrtimer_restart (* function) (struct hrtimer *);
struct hrtimer_clock_base * base;
u8 state;
u8 is_rel;
#ifdef CONFIG_TIMER_STATS
int start_pid;
void * start_site;
char start_comm[16];
#endif
};
Members
- timerqueue node, which also manages node.expires, the absolute expiry time in the hrtimers internal representation. The time is related to the clock on which the timer is based. Is setup by adding slack to the _softexpires value. For non range timers identical to _softexpires.
- the absolute earliest expiry time of the hrtimer. The time which was given as expiry time when the timer was armed.
- timer expiry callback function
- pointer to the timer base (per cpu and per clock)
- state information (See bit values above)
- Set if the timer was armed relative
- timer statistics field to store the pid of the task which started the timer
- timer statistics field to store the site where the timer was started
- timer statistics field to store the name of the process which started the timer
node
_softexpires
function
base
state
is_rel
start_pid
start_site
start_comm[16]
Description
The hrtimer structure must be initialized by hrtimer_init()
-
struct
-
simple sleeper structure
hrtimer_sleeper
Definition
struct hrtimer_sleeper {
struct hrtimer timer;
struct task_struct * task;
};
Members
- embedded timer structure
- task to wake up
timer
task
Description
task is set to NULL, when the timer expires.
-
struct
-
the timer base for a specific clock
hrtimer_clock_base
Definition
struct hrtimer_clock_base {
struct hrtimer_cpu_base * cpu_base;
int index;
clockid_t clockid;
struct timerqueue_head active;
ktime_t (* get_time) (void);
ktime_t offset;
};
Members
- per cpu clock base
- clock type index for per_cpu support when moving a timer to a base on another cpu.
- clock id for per_cpu support
- red black tree root node for the active timers
- function to retrieve the current time of the clock
- offset of this clock to the monotonic base
cpu_base
index
clockid
active
get_time
offset
-
void
-
(re)start an hrtimer on the current CPU
hrtimer_start
(struct
hrtimer
*
timer, ktime_t
tim, const enum hrtimer_mode
mode
)
Parameters
- the timer to be added
- expiry time
- expiry mode: absolute (HRTIMER_MODE_ABS) or relative (HRTIMER_MODE_REL)
struct hrtimer * timer
ktime_t tim
const enum hrtimer_mode mode
-
u64
-
forward the timer expiry so it expires after now
hrtimer_forward_now
(struct
hrtimer
*
timer, ktime_t
interval
)
Parameters
- hrtimer to forward
- the interval to forward
struct hrtimer * timer
ktime_t interval
Description
Forward the timer expiry so it will expire after the current time of the hrtimer clock base. Returns the number of overruns.
Can be safely called from the callback function of timer. If called from other contexts timer must neither be enqueued nor running the callback and the caller needs to take care of serialization.
Note
This only updates the timer expiry value and does not requeue the timer.
-
u64
-
forward the timer expiry
hrtimer_forward
(struct
hrtimer
*
timer, ktime_t
now, ktime_t
interval
)
Parameters
- hrtimer to forward
- forward past this time
- the interval to forward
struct hrtimer * timer
ktime_t now
ktime_t interval
Description
Forward the timer expiry so it will expire in the future. Returns the number of overruns.
Can be safely called from the callback function of timer. If called from other contexts timer must neither be enqueued nor running the callback and the caller needs to take care of serialization.
Note
This only updates the timer expiry value and does not requeue the timer.
-
void
-
(re)start an hrtimer on the current CPU
hrtimer_start_range_ns
(struct
hrtimer
*
timer, ktime_t
tim, u64
delta_ns, const enum hrtimer_mode
mode
)
Parameters
- the timer to be added
- expiry time
- “slack” range for the timer
- expiry mode: absolute (HRTIMER_MODE_ABS) or relative (HRTIMER_MODE_REL)
struct hrtimer * timer
ktime_t tim
u64 delta_ns
const enum hrtimer_mode mode
-
int
-
try to deactivate a timer
hrtimer_try_to_cancel
(struct
hrtimer
*
timer
)
Parameters
- hrtimer to stop
struct hrtimer * timer
Return
0 when the timer was not active 1 when the timer was active
-
-1 when the timer is currently excuting the callback function and
- cannot be stopped
-
int
-
cancel a timer and wait for the handler to finish.
hrtimer_cancel
(struct
hrtimer
*
timer
)
Parameters
- the timer to be cancelled
struct hrtimer * timer
Return
0 when the timer was not active 1 when the timer was active
-
ktime_t
-
get remaining time for the timer
__hrtimer_get_remaining
(const struct
hrtimer
*
timer, bool
adjust
)
Parameters
- the timer to read
- adjust relative timers when CONFIG_TIME_LOW_RES=y
const struct hrtimer * timer
bool adjust
-
void
-
initialize a timer to the given clock
hrtimer_init
(struct
hrtimer
*
timer, clockid_t
clock_id, enum hrtimer_mode
mode
)
Parameters
- the timer to be initialized
- the clock to be used
- timer mode abs/rel
struct hrtimer * timer
clockid_t clock_id
enum hrtimer_mode mode
-
int __sched
-
sleep until timeout
schedule_hrtimeout_range
(ktime_t *
expires, u64
delta, const enum hrtimer_mode
mode
)
Parameters
- timeout value (ktime_t)
- slack in expires timeout (ktime_t)
- timer mode, HRTIMER_MODE_ABS or HRTIMER_MODE_REL
ktime_t * expires
u64 delta
const enum hrtimer_mode mode
Description
Make the current task sleep until the given expiry time has elapsed. The routine will return immediately unless the current task state has been set (see set_current_state()
).
The delta argument gives the kernel the freedom to schedule the actual wakeup to a time that is both power and performance friendly. The kernel give the normal best effort behavior for “expires**+**delta”, but may decide to fire the timer earlier, but no earlier than expires.
You can set the task state as follows -
TASK_UNINTERRUPTIBLE
- at least timeout time is guaranteed to pass before the routine returns unless the current task is explicitly woken up, (e.g. by wake_up_process()
).
TASK_INTERRUPTIBLE
- the routine may return early if a signal is delivered to the current task or the current task is explicitly woken up.
The current task state is guaranteed to be TASK_RUNNING when this routine returns.
Returns 0 when the timer has expired. If the task was woken before the timer expired by a signal (only possible in state TASK_INTERRUPTIBLE) or by an explicit wakeup, it returns -EINTR.
-
int __sched
-
sleep until timeout
schedule_hrtimeout
(ktime_t *
expires, const enum hrtimer_mode
mode
)
Parameters
- timeout value (ktime_t)
- timer mode, HRTIMER_MODE_ABS or HRTIMER_MODE_REL
ktime_t * expires
const enum hrtimer_mode mode
Description
Make the current task sleep until the given expiry time has elapsed. The routine will return immediately unless the current task state has been set (see set_current_state()
).
You can set the task state as follows -
TASK_UNINTERRUPTIBLE
- at least timeout time is guaranteed to pass before the routine returns unless the current task is explicitly woken up, (e.g. by wake_up_process()
).
TASK_INTERRUPTIBLE
- the routine may return early if a signal is delivered to the current task or the current task is explicitly woken up.
The current task state is guaranteed to be TASK_RUNNING when this routine returns.
Returns 0 when the timer has expired. If the task was woken before the timer expired by a signal (only possible in state TASK_INTERRUPTIBLE) or by an explicit wakeup, it returns -EINTR.
Workqueues and Kevents
-
struct
-
A struct for workqueue attributes.
workqueue_attrs
Definition
struct workqueue_attrs {
int nice;
cpumask_var_t cpumask;
bool no_numa;
};
Members
- nice level
- allowed CPUs
-
disable NUMA affinity
Unlike other fields,
no_numa
isn’t a property of a worker_pool. It only modifies howapply_workqueue_attrs()
select pools and thus doesn’t participate in pool hash calculations or equality comparisons.
nice
cpumask
no_numa
Description
This can be used to change attributes of an unbound workqueue.
-
Find out whether a work item is currently pending
work_pending
(
work
)
Parameters
- The work item in question
work
-
Find out whether a delayable work item is currently pending
delayed_work_pending
(
w
)
Parameters
- The work item in question
w
-
allocate a workqueue
alloc_workqueue
(
fmt,
flags,
max_active,
args...
)
Parameters
- printf format for the name of the workqueue
- WQ_* flags
- max in-flight work items, 0 for default
- args for fmt
fmt
flags
max_active
args...
Description
Allocate a workqueue with the specified parameters. For detailed information on WQ_* flags, please refer to Documentation/core-api/workqueue.rst.
The __lock_name macro dance is to guarantee that single lock_class_key doesn’t end up with different namesm, which isn’t allowed by lockdep.
Return
Pointer to the allocated workqueue on success, NULL
on failure.
-
allocate an ordered workqueue
alloc_ordered_workqueue
(
fmt,
flags,
args...
)
Parameters
- printf format for the name of the workqueue
- WQ_* flags (only WQ_FREEZABLE and WQ_MEM_RECLAIM are meaningful)
- args for fmt
fmt
flags
args...
Description
Allocate an ordered workqueue. An ordered workqueue executes at most one work item at any given time in the queued order. They are implemented as unbound workqueues with max_active of one.
Return
Pointer to the allocated workqueue on success, NULL
on failure.
-
bool
-
queue work on a workqueue
queue_work
(struct workqueue_struct *
wq, struct work_struct *
work
)
Parameters
- workqueue to use
- work to queue
struct workqueue_struct * wq
struct work_struct * work
Description
Returns false
if work was already on a queue, true
otherwise.
We queue the work to the CPU on which it was submitted, but if the CPU dies it can be processed by another CPU.
-
bool
-
queue work on a workqueue after delay
queue_delayed_work
(struct workqueue_struct *
wq, struct delayed_work *
dwork, unsigned long
delay
)
Parameters
- workqueue to use
- delayable work to queue
- number of jiffies to wait before queueing
struct workqueue_struct * wq
struct delayed_work * dwork
unsigned long delay
Description
Equivalent to queue_delayed_work_on()
but tries to use the local CPU.
-
bool
-
modify delay of or queue a delayed work
mod_delayed_work
(struct workqueue_struct *
wq, struct delayed_work *
dwork, unsigned long
delay
)
Parameters
- workqueue to use
- work to queue
- number of jiffies to wait before queueing
struct workqueue_struct * wq
struct delayed_work * dwork
unsigned long delay
Description
mod_delayed_work_on()
on local CPU.
-
bool
-
put work task on a specific cpu
schedule_work_on
(int
cpu, struct work_struct *
work
)
Parameters
- cpu to put the work task on
- job to be done
int cpu
struct work_struct * work
Description
This puts a job on a specific cpu
-
bool
-
put work task in global workqueue
schedule_work
(struct work_struct *
work
)
Parameters
- job to be done
struct work_struct * work
Description
Returns false
if work was already on the kernel-global workqueue and true
otherwise.
This puts a job in the kernel-global workqueue if it was not already queued and leaves it in the same position on the kernel-global workqueue otherwise.
-
void
-
ensure that any scheduled work has run to completion.
flush_scheduled_work
(void
)
Parameters
- no arguments
void
Description
Forces execution of the kernel-global workqueue and blocks until its completion.
Think twice before calling this function! It’s very easy to get into trouble if you don’t take great care. Either of the following situations will lead to deadlock:
One of the work items currently on the workqueue needs to acquire a lock held by your code or its caller.
Your code is running in the context of a work routine.
They will be detected by lockdep when they occur, but the first might not occur very often. It depends on what work items are on the workqueue and what locks they need, which you have no control over.
In most situations flushing the entire workqueue is overkill; you merely need to know that a particular work item isn’t queued and isn’t running. In such cases you should use cancel_delayed_work_sync()
or cancel_work_sync()
instead.
-
bool
-
queue work in global workqueue on CPU after delay
schedule_delayed_work_on
(int
cpu, struct delayed_work *
dwork, unsigned long
delay
)
Parameters
- cpu to use
- job to be done
- number of jiffies to wait
int cpu
struct delayed_work * dwork
unsigned long delay
Description
After waiting for a given time this puts a job in the kernel-global workqueue on the specified CPU.
-
bool
-
put work task in global workqueue after delay
schedule_delayed_work
(struct delayed_work *
dwork, unsigned long
delay
)
Parameters
- job to be done
- number of jiffies to wait or 0 for immediate execution
struct delayed_work * dwork
unsigned long delay
Description
After waiting for a given time this puts a job in the kernel-global workqueue.
-
bool
-
queue work on specific cpu
queue_work_on
(int
cpu, struct workqueue_struct *
wq, struct work_struct *
work
)
Parameters
- CPU number to execute work on
- workqueue to use
- work to queue
int cpu
struct workqueue_struct * wq
struct work_struct * work
Description
We queue the work to a specific CPU, the caller must ensure it can’t go away.
Return
false
if work was already on a queue, true
otherwise.
-
bool
-
queue work on specific CPU after delay
queue_delayed_work_on
(int
cpu, struct workqueue_struct *
wq, struct delayed_work *
dwork, unsigned long
delay
)
Parameters
- CPU number to execute work on
- workqueue to use
- work to queue
- number of jiffies to wait before queueing
int cpu
struct workqueue_struct * wq
struct delayed_work * dwork
unsigned long delay
Return
false
if work was already on a queue, true
otherwise. If delay is zero and dwork is idle, it will be scheduled for immediate execution.
-
bool
-
modify delay of or queue a delayed work on specific CPU
mod_delayed_work_on
(int
cpu, struct workqueue_struct *
wq, struct delayed_work *
dwork, unsigned long
delay
)
Parameters
- CPU number to execute work on
- workqueue to use
- work to queue
- number of jiffies to wait before queueing
int cpu
struct workqueue_struct * wq
struct delayed_work * dwork
unsigned long delay
Description
If dwork is idle, equivalent to queue_delayed_work_on()
; otherwise, modify dwork‘s timer so that it expires after delay. If delay is zero, work is guaranteed to be scheduled immediately regardless of its current state.
Return
false
if dwork was idle and queued, true
if dwork was pending and its timer was modified.
This function is safe to call from any context including IRQ handler. See try_to_grab_pending()
for details.
-
void
-
ensure that any scheduled work has run to completion.
flush_workqueue
(struct workqueue_struct *
wq
)
Parameters
- workqueue to flush
struct workqueue_struct * wq
Description
This function sleeps until all work items which were queued on entry have finished execution, but it is not livelocked by new incoming ones.
-
void
-
drain a workqueue
drain_workqueue
(struct workqueue_struct *
wq
)
Parameters
- workqueue to drain
struct workqueue_struct * wq
Description
Wait until the workqueue becomes empty. While draining is in progress, only chain queueing is allowed. IOW, only currently pending or running work items on wq can queue further work items on it. wq is flushed repeatedly until it becomes empty. The number of flushing is determined by the depth of chaining and should be relatively short. Whine if it takes too long.
-
bool
-
wait for a work to finish executing the last queueing instance
flush_work
(struct work_struct *
work
)
Parameters
- the work to flush
struct work_struct * work
Description
Wait until work has finished execution. work is guaranteed to be idle on return if it hasn’t been requeued since flush started.
Return
true
if flush_work()
waited for the work to finish execution, false
if it was already idle.
-
bool
-
cancel a work and wait for it to finish
cancel_work_sync
(struct work_struct *
work
)
Parameters
- the work to cancel
struct work_struct * work
Description
Cancel work and wait for its execution to finish. This function can be used even if the work re-queues itself or migrates to another workqueue. On return from this function, work is guaranteed to be not pending or executing on any CPU.
cancel_work_sync(delayed_work->work
) must not be used for delayed_work’s. Use cancel_delayed_work_sync()
instead.
The caller must ensure that the workqueue on which work was last queued can’t be destroyed before this function returns.
Return
true
if work was pending, false
otherwise.
-
bool
-
wait for a dwork to finish executing the last queueing
flush_delayed_work
(struct delayed_work *
dwork
)
Parameters
- the delayed work to flush
struct delayed_work * dwork
Description
Delayed timer is cancelled and the pending work is queued for immediate execution. Like flush_work()
, this function only considers the last queueing instance of dwork.
Return
true
if flush_work()
waited for the work to finish execution, false
if it was already idle.
-
bool
-
cancel a delayed work
cancel_delayed_work
(struct delayed_work *
dwork
)
Parameters
- delayed_work to cancel
struct delayed_work * dwork
Description
Kill off a pending delayed_work.
Return
true
if dwork was pending and canceled; false
if it wasn’t pending.
Note
The work callback function may still be running on return, unless it returns true
and the work doesn’t re-arm itself. Explicitly flush or usecancel_delayed_work_sync()
to wait on it.
This function is safe to call from any context including IRQ handler.
-
bool
-
cancel a delayed work and wait for it to finish
cancel_delayed_work_sync
(struct delayed_work *
dwork
)
Parameters
- the delayed work cancel
struct delayed_work * dwork
Description
This is cancel_work_sync()
for delayed works.
Return
true
if dwork was pending, false
otherwise.
-
int
-
reliably execute the routine with user context
execute_in_process_context
(work_func_t
fn, struct execute_work *
ew
)
Parameters
- the function to execute
- guaranteed storage for the execute work structure (must be available when the work executes)
work_func_t fn
struct execute_work * ew
Description
Executes the function immediately if process context is available, otherwise schedules the function for delayed execution.
Return
-
0 - function was executed
- 1 - function was scheduled for execution
-
void
-
safely terminate a workqueue
destroy_workqueue
(struct workqueue_struct *
wq
)
Parameters
- target workqueue
struct workqueue_struct * wq
Description
Safely destroy a workqueue. All work currently pending will be done first.
-
void
-
adjust max_active of a workqueue
workqueue_set_max_active
(struct workqueue_struct *
wq, int
max_active
)
Parameters
- target workqueue
- new max_active value.
struct workqueue_struct * wq
int max_active
Description
Set max_active of wq to max_active.
Context
Don’t call from IRQ context.
-
bool
-
test whether a workqueue is congested
workqueue_congested
(int
cpu, struct workqueue_struct *
wq
)
Parameters
- CPU in question
- target workqueue
int cpu
struct workqueue_struct * wq
Description
Test whether wq‘s cpu workqueue for cpu is congested. There is no synchronization around this function and the test result is unreliable and only useful as advisory hints or for debugging.
If cpu is WORK_CPU_UNBOUND, the test is performed on the local CPU. Note that both per-cpu and unbound workqueues may be associated with multiple pool_workqueues which have separate congested states. A workqueue being congested on one CPU doesn’t mean the workqueue is also contested on other CPUs / NUMA nodes.
Return
true
if congested, false
otherwise.
-
unsigned int
-
test whether a work is currently pending or running
work_busy
(struct work_struct *
work
)
Parameters
- the work to be tested
struct work_struct * work
Description
Test whether work is currently pending or running. There is no synchronization around this function and the test result is unreliable and only useful as advisory hints or for debugging.
Return
OR’d bitmask of WORK_BUSY_* bits.
-
long
-
run a function in thread context on a particular cpu
work_on_cpu
(int
cpu, long (*fn) (void
*, void *
arg
)
Parameters
- the cpu to run on
- the function to run
- the function arg
int cpu
long (*)(void *) fn
void * arg
Description
It is up to the caller to ensure that the cpu doesn’t go offline. The caller must not hold any locks which would prevent fn from completing.
Return
The value fn returns.
Internal Functions
-
int
-
Wait for
TASK_STOPPED
orTASK_TRACED
wait_task_stopped
(struct wait_opts *
wo, int
ptrace, struct task_struct *
p
)
Parameters
- wait options
- is the wait for ptrace
- task to wait for
struct wait_opts * wo
int ptrace
struct task_struct * p
Description
Handle sys_wait4()
work for p
in state TASK_STOPPED
or TASK_TRACED
.
Context
read_lock(tasklist_lock
), which is released if return value is non-zero. Also, grabs and releases p->sighand->siglock.
Return
0 if wait condition didn’t exist and search for other wait conditions should continue. Non-zero return, -errno on failure and p‘s pid on success, implies that tasklist_lock is released and wait condition search should terminate.
-
bool
-
set jobctl pending bits
task_set_jobctl_pending
(struct task_struct *
task, unsigned long
mask
)
Parameters
- target task
- pending bits to set
struct task_struct * task
unsigned long mask
Description
Clear mask from task->jobctl. mask must be subset of JOBCTL_PENDING_MASK
| JOBCTL_STOP_CONSUME
| JOBCTL_STOP_SIGMASK
| JOBCTL_TRAPPING
. If stop signo is being set, the existing signo is cleared. If task is already being killed or exiting, this function becomes noop.
Context
Must be called with task->sighand->siglock held.
Return
true
if mask is set, false
if made noop because task was dying.
-
void
-
clear jobctl trapping bit
task_clear_jobctl_trapping
(struct task_struct *
task
)
Parameters
- target task
struct task_struct * task
Description
If JOBCTL_TRAPPING is set, a ptracer is waiting for us to enter TRACED. Clear it and wake up the ptracer. Note that we don’t need any further locking. task->siglock guarantees that task->parent points to the ptracer.
Context
Must be called with task->sighand->siglock held.
-
void
-
clear jobctl pending bits
task_clear_jobctl_pending
(struct task_struct *
task, unsigned long
mask
)
Parameters
- target task
- pending bits to clear
struct task_struct * task
unsigned long mask
Description
Clear mask from task->jobctl. mask must be subset of JOBCTL_PENDING_MASK
. If JOBCTL_STOP_PENDING
is being cleared, other STOP bits are cleared together.
If clearing of mask leaves no stop or trap pending, this function calls task_clear_jobctl_trapping()
.
Context
Must be called with task->sighand->siglock held.
-
bool
-
participate in a group stop
task_participate_group_stop
(struct task_struct *
task
)
Parameters
- task participating in a group stop
struct task_struct * task
Description
task has JOBCTL_STOP_PENDING
set and is participating in a group stop. Group stop states are cleared and the group stop count is consumed ifJOBCTL_STOP_CONSUME
was set. If the consumption completes the group stop, the appropriate ``SIGNAL_``* flags are set.
Context
Must be called with task->sighand->siglock held.
Return
true
if group stop completion should be notified to the parent, false
otherwise.
-
void
-
schedule trap to notify ptracer
ptrace_trap_notify
(struct task_struct *
t
)
Parameters
- tracee wanting to notify tracer
struct task_struct * t
Description
This function schedules sticky ptrace trap which is cleared on the next TRAP_STOP to notify ptracer of an event. t must have been seized by ptracer.
If t is running, STOP trap will be taken. If trapped for STOP and ptracer is listening for events, tracee is woken up so that it can re-trap for the new event. If trapped otherwise, STOP trap will be eventually taken without returning to userland after the existing traps are finished by PTRACE_CONT.
Context
Must be called with task->sighand->siglock held.
-
void
-
notify parent of stopped/continued state change
do_notify_parent_cldstop
(struct task_struct *
tsk, bool
for_ptracer, int
why
)
Parameters
- task reporting the state change
- the notification is for ptracer
- CLD_{CONTINUED|STOPPED|TRAPPED} to report
struct task_struct * tsk
bool for_ptracer
int why
Description
Notify tsk‘s parent that the stopped/continued state has changed. If for_ptracer is false
, tsk‘s group leader notifies to its real parent. If true
, tskreports to tsk->parent which should be the ptracer.
Context
Must be called with tasklist_lock at least read locked.
-
bool
-
handle group stop for SIGSTOP and other stop signals
do_signal_stop
(int
signr
)
Parameters
- signr causing group stop if initiating
int signr
Description
If JOBCTL_STOP_PENDING
is not set yet, initiate group stop with signr and participate in it. If already set, participate in the existing group stop. If participated in a group stop (and thus slept), true
is returned with siglock released.
If ptraced, this function doesn’t handle stop itself. Instead, JOBCTL_TRAP_STOP
is scheduled and false
is returned with siglock untouched. The caller must ensure that INTERRUPT trap handling takes places afterwards.
Context
Must be called with current->sighand->siglock held, which is released on true
return.
Return
false
if group stop is already cancelled or ptrace trap is scheduled. true
if participated in group stop.
-
void
-
take care of ptrace jobctl traps
do_jobctl_trap
(void
)
Parameters
- no arguments
void
Description
When PT_SEIZED, it’s used for both group stop and explicit SEIZE/INTERRUPT traps. Both generate PTRACE_EVENT_STOP trap with accompanying siginfo. If stopped, lower eight bits of exit_code contain the stop signal; otherwise, SIGTRAP
.
When !PT_SEIZED, it’s used only for group stop trap with stop signal number as exit_code and no siginfo.
Context
Must be called with current->sighand->siglock held, which may be released and re-acquired before returning with intervening sleep.
-
void
-
signal_delivered
(struct ksignal *
ksig, int
stepping
)
Parameters
- kernel signal struct
- nonzero if debugger single-step or block-step in use
struct ksignal * ksig
int stepping
Description
This function should be called when a signal has successfully been delivered. It updates the blocked signals accordingly (ksig->ka.sa.sa_mask is always blocked, and the signal itself is blocked unless SA_NODEFER
is set in ksig->ka.sa.sa_flags. Tracing is notified.
-
long
-
restart a system call
sys_restart_syscall
(void
)
Parameters
- no arguments
void
-
void
-
change current->blocked mask
set_current_blocked
(sigset_t *
newset
)
Parameters
- new mask
sigset_t * newset
Description
It is wrong to change ->blocked directly, this helper should be used to ensure the process can’t miss a shared signal we are going to block.
-
long
-
change the list of currently blocked signals
sys_rt_sigprocmask
(int
how, sigset_t __user *
nset, sigset_t __user *
oset, size_t
sigsetsize
)
Parameters
- whether to add, remove, or set signals
- stores pending signals
- previous value of signal mask if non-null
- size of sigset_t type
int how
sigset_t __user * nset
sigset_t __user * oset
size_t sigsetsize
-
long
-
examine a pending signal that has been raised while blocked
sys_rt_sigpending
(sigset_t __user *
uset, size_t
sigsetsize
)
Parameters
- stores pending signals
- size of sigset_t type or larger
sigset_t __user * uset
size_t sigsetsize
-
int
-
wait for queued signals specified in which
do_sigtimedwait
(const sigset_t *
which, siginfo_t *
info, const struct timespec *
ts
)
Parameters
- queued signals to wait for
- if non-null, the signal’s siginfo is returned here
- upper bound on process time suspension
const sigset_t * which
siginfo_t * info
const struct timespec * ts
-
long
-
synchronously wait for queued signals specified in uthese
sys_rt_sigtimedwait
(const sigset_t __user *
uthese, siginfo_t __user *
uinfo, const struct timespec __user *
uts, size_t
sigsetsize
)
Parameters
- queued signals to wait for
- if non-null, the signal’s siginfo is returned here
- upper bound on process time suspension
- size of sigset_t type
const sigset_t __user * uthese
siginfo_t __user * uinfo
const struct timespec __user * uts
size_t sigsetsize
-
long
-
send a signal to a process
sys_kill
(pid_t
pid, int
sig
)
Parameters
- the PID of the process
- signal to be sent
pid_t pid
int sig
-
long
-
send signal to one specific thread
sys_tgkill
(pid_t
tgid, pid_t
pid, int
sig
)
Parameters
- the thread group ID of the thread
- the PID of the thread
- signal to be sent
pid_t tgid
pid_t pid
int sig
Description
This syscall also checks the tgid and returns -ESRCH even if the PID exists but it’s not belonging to the target process anymore. This method solves the problem of threads exiting and PIDs getting reused.
-
long
-
send signal to one specific task
sys_tkill
(pid_t
pid, int
sig
)
Parameters
- the PID of the task
- signal to be sent
pid_t pid
int sig
Description
Send a signal to only one task, even if it’s a CLONE_THREAD task.
-
long
-
send signal information to a signal
sys_rt_sigqueueinfo
(pid_t
pid, int
sig, siginfo_t __user *
uinfo
)
Parameters
- the PID of the thread
- signal to be sent
- signal info to be sent
pid_t pid
int sig
siginfo_t __user * uinfo
-
long
-
examine pending signals
sys_sigpending
(old_sigset_t __user *
set
)
Parameters
- where mask of pending signal is returned
old_sigset_t __user * set
-
long
-
examine and change blocked signals
sys_sigprocmask
(int
how, old_sigset_t __user *
nset, old_sigset_t __user *
oset
)
Parameters
- whether to add, remove, or set signals
- signals to add or remove (if non-null)
- previous value of signal mask if non-null
int how
old_sigset_t __user * nset
old_sigset_t __user * oset
Description
Some platforms have their own version with special arguments; others support only sys_rt_sigprocmask.
-
long
-
alter an action taken by a process
sys_rt_sigaction
(int
sig, const struct sigaction __user *
act, struct sigaction __user *
oact, size_t
sigsetsize
)
Parameters
- signal to be sent
- new sigaction
- used to save the previous sigaction
- size of sigset_t type
int sig
const struct sigaction __user * act
struct sigaction __user * oact
size_t sigsetsize
-
long
-
replace the signal mask for a value with the unewset value until a signal is received
sys_rt_sigsuspend
(sigset_t __user *
unewset, size_t
sigsetsize
)
Parameters
- new signal mask value
- size of sigset_t type
sigset_t __user * unewset
size_t sigsetsize
-
create a kthread on the current node
kthread_create
(
threadfn,
data,
namefmt,
arg...
)
Parameters
- the function to run in the thread
- data pointer for threadfn()
- printf-style format string for the thread name
- variable arguments
threadfn
data
namefmt
arg...
Description
This macro will create a kthread on the current node, leaving it in the stopped state. This is just a helper for kthread_create_on_node()
; see the documentation there for more details.
-
create and wake a thread.
kthread_run
(
threadfn,
data,
namefmt,
...
)
Parameters
- the function to run until signal_pending(current).
- data ptr for threadfn.
- printf-style name for the thread.
- variable arguments
threadfn
data
namefmt
...
Description
Convenient wrapper for kthread_create()
followed by wake_up_process()
. Returns the kthread or ERR_PTR(-ENOMEM).
-
bool
-
should this kthread return now?
kthread_should_stop
(void
)
Parameters
- no arguments
void
Description
When someone calls kthread_stop()
on your kthread, it will be woken and this will return true. You should then return, and your return value will be passed through to kthread_stop()
.
-
bool
-
should this kthread park now?
kthread_should_park
(void
)
Parameters
- no arguments
void
Description
When someone calls kthread_park()
on your kthread, it will be woken and this will return true. You should then do the necessary cleanup and callkthread_parkme()
Similar to kthread_should_stop()
, but this keeps the thread alive and in a park position. kthread_unpark()
“restarts” the thread and calls the thread function again.
-
bool
-
should this freezable kthread return now?
kthread_freezable_should_stop
(bool *
was_frozen
)
Parameters
-
optional out parameter, indicates whether
current
was frozen
bool * was_frozen
Description
kthread_should_stop()
for freezable kthreads, which will enter refrigerator if necessary. This function is safe from kthread_stop()
/ freezer deadlock and freezable kthreads should use this function instead of calling try_to_freeze()
directly.
-
struct task_struct *
-
create a kthread.
kthread_create_on_node
(int (*threadfn) (void
*data, void *
data, int
node, const char
namefmt[], ...
)
Parameters
- the function to run until signal_pending(current).
- data ptr for threadfn.
- task and thread structures for the thread are allocated on this node
- printf-style name for the thread.
- variable arguments
int (*)(void *data) threadfn
void * data
int node
const char namefmt[]
...
Description
This helper function creates and names a kernel thread. The thread will be stopped: use wake_up_process()
to start it. See also kthread_run()
. The new thread has SCHED_NORMAL policy and is affine to all CPUs.
If thread is going to be bound on a particular cpu, give its node in node, to get NUMA affinity for kthread stack, or else give NUMA_NO_NODE. When woken, the thread will run threadfn() with data as its argument. threadfn() can either call do_exit()
directly if it is a standalone thread for which no one will call kthread_stop()
, or return when ‘kthread_should_stop()
‘ is true (which means kthread_stop()
has been called). The return value should be zero or a negative error number; it will be passed to kthread_stop()
.
Returns a task_struct or ERR_PTR(-ENOMEM) or ERR_PTR(-EINTR).
-
void
-
bind a just-created kthread to a cpu.
kthread_bind
(struct task_struct *
p, unsigned int
cpu
)
Parameters
-
thread created by
kthread_create()
. - cpu (might not be online, must be possible) for k to run on.
struct task_struct * p
unsigned int cpu
Description
This function is equivalent to set_cpus_allowed()
, except that cpu doesn’t need to be online, and the thread must be stopped (i.e., just returned from kthread_create()
).
-
void
-
unpark a thread created by
kthread_create()
.
kthread_unpark
(struct task_struct *
k
)
Parameters
-
thread created by
kthread_create()
.
struct task_struct * k
Description
Sets kthread_should_park()
for k to return false, wakes it, and waits for it to return. If the thread is marked percpu then its bound to the cpu again.
-
int
-
park a thread created by
kthread_create()
.
kthread_park
(struct task_struct *
k
)
Parameters
-
thread created by
kthread_create()
.
struct task_struct * k
Description
Sets kthread_should_park()
for k to return true, wakes it, and waits for it to return. This can also be called after kthread_create()
instead of callingwake_up_process()
: the thread will park without calling threadfn()
.
Returns 0 if the thread is parked, -ENOSYS if the thread exited. If called by the kthread itself just the park bit is set.
-
int
-
stop a thread created by
kthread_create()
.
kthread_stop
(struct task_struct *
k
)
Parameters
-
thread created by
kthread_create()
.
struct task_struct * k
Description
Sets kthread_should_stop()
for k to return true, wakes it, and waits for it to exit. This can also be called after kthread_create()
instead of callingwake_up_process()
: the thread will exit without calling threadfn()
.
If threadfn()
may call do_exit()
itself, the caller must ensure task_struct can’t go away.
Returns the result of threadfn()
, or -EINTR
if wake_up_process()
was never called.
-
int
-
kthread function to process kthread_worker
kthread_worker_fn
(void *
worker_ptr
)
Parameters
- pointer to initialized kthread_worker
void * worker_ptr
Description
This function implements the main cycle of kthread worker. It processes work_list until it is stopped with kthread_stop()
. It sleeps when the queue is empty.
The works are not allowed to keep any locks, disable preemption or interrupts when they finish. There is defined a safe point for freezing when one work finishes and before a new one is started.
Also the works must not be handled by more than one worker at the same time, see also kthread_queue_work()
.
-
struct kthread_worker *
-
create a kthread worker
kthread_create_worker
(unsigned int
flags, const char
namefmt[], ...
)
Parameters
- flags modifying the default behavior of the worker
- printf-style name for the kthread worker (task).
- variable arguments
unsigned int flags
const char namefmt[]
...
Description
Returns a pointer to the allocated worker on success, ERR_PTR(-ENOMEM) when the needed structures could not get allocated, and ERR_PTR(-EINTR) when the worker was SIGKILLed.
-
struct kthread_worker *
-
create a kthread worker and bind it it to a given CPU and the associated NUMA node.
kthread_create_worker_on_cpu
(int
cpu, unsigned int
flags, const char
namefmt[], ...
)
Parameters
- CPU number
- flags modifying the default behavior of the worker
- printf-style name for the kthread worker (task).
- variable arguments
int cpu
unsigned int flags
const char namefmt[]
...
Description
Use a valid CPU number if you want to bind the kthread worker to the given CPU and the associated NUMA node.
A good practice is to add the cpu number also into the worker name. For example, use kthread_create_worker_on_cpu(cpu, “helper/d
”, cpu).
Returns a pointer to the allocated worker on success, ERR_PTR(-ENOMEM) when the needed structures could not get allocated, and ERR_PTR(-EINTR) when the worker was SIGKILLed.
-
bool
-
queue a kthread_work
kthread_queue_work
(struct kthread_worker *
worker, struct kthread_work *
work
)
Parameters
- target kthread_worker
- kthread_work to queue
struct kthread_worker * worker
struct kthread_work * work
Description
Queue work to work processor task for async execution. task must have been created with kthread_worker_create()
. Returns true
if work was successfully queued, false
if it was already pending.
Reinitialize the work if it needs to be used by another worker. For example, when the worker was stopped and started again.
-
void
-
callback that queues the associated kthread delayed work when the timer expires.
kthread_delayed_work_timer_fn
(unsigned long
__data
)
Parameters
- pointer to the data associated with the timer
unsigned long __data
Description
The format of the function is defined by struct timer_list. It should have been called from irqsafe timer with irq already off.
-
bool
-
queue the associated kthread work after a delay.
kthread_queue_delayed_work
(struct kthread_worker *
worker, struct kthread_delayed_work *
dwork, unsigned long
delay
)
Parameters
- target kthread_worker
- kthread_delayed_work to queue
- number of jiffies to wait before queuing
struct kthread_worker * worker
struct kthread_delayed_work * dwork
unsigned long delay
Description
If the work has not been pending it starts a timer that will queue the work after the given delay. If delay is zero, it queues the work immediately.
Return
false
if the work has already been pending. It means that either the timer was running or the work was queued. It returns true
otherwise.
-
void
-
flush a kthread_work
kthread_flush_work
(struct kthread_work *
work
)
Parameters
- work to flush
struct kthread_work * work
Description
If work is queued or executing, wait for it to finish execution.
-
bool
-
modify delay of or queue a kthread delayed work
kthread_mod_delayed_work
(struct kthread_worker *
worker, struct kthread_delayed_work *
dwork, unsigned long
delay
)
Parameters
- kthread worker to use
- kthread delayed work to queue
- number of jiffies to wait before queuing
struct kthread_worker * worker
struct kthread_delayed_work * dwork
unsigned long delay
Description
If dwork is idle, equivalent to kthread_queue_delayed_work()
. Otherwise, modify dwork‘s timer so that it expires after delay. If delay is zero, workis guaranteed to be queued immediately.
Return
true
if dwork was pending and its timer was modified, false
otherwise.
A special case is when the work is being canceled in parallel. It might be caused either by the real kthread_cancel_delayed_work_sync()
or yet another kthread_mod_delayed_work()
call. We let the other command win and return false
here. The caller is supposed to synchronize these operations a reasonable way.
This function is safe to call from any context including IRQ handler. See __kthread_cancel_work()
and kthread_delayed_work_timer_fn()
for details.
-
bool
-
cancel a kthread work and wait for it to finish
kthread_cancel_work_sync
(struct kthread_work *
work
)
Parameters
- the kthread work to cancel
struct kthread_work * work
Description
Cancel work and wait for its execution to finish. This function can be used even if the work re-queues itself. On return from this function, work is guaranteed to be not pending or executing on any CPU.
kthread_cancel_work_sync(delayed_work->work
) must not be used for delayed_work’s. Use kthread_cancel_delayed_work_sync()
instead.
The caller must ensure that the worker on which work was last queued can’t be destroyed before this function returns.
Return
true
if work was pending, false
otherwise.
-
bool
-
cancel a kthread delayed work and wait for it to finish.
kthread_cancel_delayed_work_sync
(struct kthread_delayed_work *
dwork
)
Parameters
- the kthread delayed work to cancel
struct kthread_delayed_work * dwork
Description
This is kthread_cancel_work_sync()
for delayed works.
Return
true
if dwork was pending, false
otherwise.
-
void
-
flush all current works on a kthread_worker
kthread_flush_worker
(struct kthread_worker *
worker
)
Parameters
- worker to flush
struct kthread_worker * worker
Description
Wait until all currently executing or pending works on worker are finished.
-
void
-
destroy a kthread worker
kthread_destroy_worker
(struct kthread_worker *
worker
)
Parameters
- worker to be destroyed
struct kthread_worker * worker
Description
Flush and destroy worker. The simple flush is enough because the kthread worker API is used only in trivial scenarios. There are no multi-step state machines needed.
Kernel objects manipulation
-
char *
-
generate and return the path associated with a given kobj and kset pair.
kobject_get_path
(struct kobject *
kobj, gfp_t
gfp_mask
)
Parameters
- kobject in question, with which to build the path
- the allocation type used to allocate the path
struct kobject * kobj
gfp_t gfp_mask
Description
The result must be freed by the caller with kfree()
.
-
int
-
Set the name of a kobject
kobject_set_name
(struct kobject *
kobj, const char *
fmt, ...
)
Parameters
- struct kobject to set the name of
- format string used to build the name
- variable arguments
struct kobject * kobj
const char * fmt
...
Description
This sets the name of the kobject. If you have already added the kobject to the system, you must call kobject_rename()
in order to change the name of the kobject.
-
void
-
initialize a kobject structure
kobject_init
(struct kobject *
kobj, struct kobj_type *
ktype
)
Parameters
- pointer to the kobject to initialize
- pointer to the ktype for this kobject.
struct kobject * kobj
struct kobj_type * ktype
Description
This function will properly initialize a kobject such that it can then be passed to the kobject_add()
call.
After this function is called, the kobject MUST be cleaned up by a call to kobject_put()
, not by a call to kfree directly to ensure that all of the memory is cleaned up properly.
-
int
-
the main kobject add function
kobject_add
(struct kobject *
kobj, struct kobject *
parent, const char *
fmt, ...
)
Parameters
- the kobject to add
- pointer to the parent of the kobject.
- format to name the kobject with.
- variable arguments
struct kobject * kobj
struct kobject * parent
const char * fmt
...
Description
The kobject name is set and added to the kobject hierarchy in this function.
If parent is set, then the parent of the kobj will be set to it. If parent is NULL, then the parent of the kobj will be set to the kobject associated with the kset assigned to this kobject. If no kset is assigned to the kobject, then the kobject will be located in the root of the sysfs tree.
If this function returns an error, kobject_put()
must be called to properly clean up the memory associated with the object. Under no instance should the kobject that is passed to this function be directly freed with a call to kfree()
, that can leak memory.
Note, no “add” uevent will be created with this call, the caller should set up all of the necessary sysfs files for the object and then callkobject_uevent()
with the UEVENT_ADD parameter to ensure that userspace is properly notified of this kobject’s creation.
-
int
-
initialize a kobject structure and add it to the kobject hierarchy
kobject_init_and_add
(struct kobject *
kobj, struct kobj_type *
ktype, struct kobject *
parent, const char *
fmt, ...
)
Parameters
- pointer to the kobject to initialize
- pointer to the ktype for this kobject.
- pointer to the parent of this kobject.
- the name of the kobject.
- variable arguments
struct kobject * kobj
struct kobj_type * ktype
struct kobject * parent
const char * fmt
...
Description
This function combines the call to kobject_init()
and kobject_add()
. The same type of error handling after a call to kobject_add()
and kobject lifetime rules are the same here.
-
int
-
change the name of an object
kobject_rename
(struct kobject *
kobj, const char *
new_name
)
Parameters
- object in question.
- object’s new name
struct kobject * kobj
const char * new_name
Description
It is the responsibility of the caller to provide mutual exclusion between two different calls of kobject_rename on the same kobject and to ensure that new_name is valid and won’t conflict with other kobjects.
-
int
-
move object to another parent
kobject_move
(struct kobject *
kobj, struct kobject *
new_parent
)
Parameters
- object in question.
- object’s new parent (can be NULL)
struct kobject * kobj
struct kobject * new_parent
-
void
-
unlink kobject from hierarchy.
kobject_del
(struct kobject *
kobj
)
Parameters
- object.
struct kobject * kobj
-
struct kobject *
-
increment refcount for object.
kobject_get
(struct kobject *
kobj
)
Parameters
- object.
struct kobject * kobj
-
void
-
decrement refcount for object.
kobject_put
(struct kobject *
kobj
)
Parameters
- object.
struct kobject * kobj
Description
Decrement the refcount, and if 0, call kobject_cleanup()
.
-
struct kobject *
-
create a struct kobject dynamically and register it with sysfs
kobject_create_and_add
(const char *
name, struct kobject *
parent
)
Parameters
- the name for the kobject
- the parent kobject of this kobject, if any.
const char * name
struct kobject * parent
Description
This function creates a kobject structure dynamically and registers it with sysfs. When you are finished with this structure, call kobject_put()
and the structure will be dynamically freed when it is no longer being used.
If the kobject was not able to be created, NULL will be returned.
-
int
-
initialize and add a kset.
kset_register
(struct kset *
k
)
Parameters
- kset.
struct kset * k
-
void
-
remove a kset.
kset_unregister
(struct kset *
k
)
Parameters
- kset.
struct kset * k
-
struct kobject *
-
search for object in kset.
kset_find_obj
(struct kset *
kset, const char *
name
)
Parameters
- kset we’re looking in.
- object’s name.
struct kset * kset
const char * name
Description
Lock kset via kset->subsys, and iterate over kset->list, looking for a matching kobject. If matching object is found take a reference and return the object.
-
struct kset *
-
create a struct kset dynamically and add it to sysfs
kset_create_and_add
(const char *
name, const struct kset_uevent_ops *
uevent_ops, struct kobject *
parent_kobj
)
Parameters
- the name for the kset
- a struct kset_uevent_ops for the kset
- the parent kobject of this kset, if any.
const char * name
const struct kset_uevent_ops * uevent_ops
struct kobject * parent_kobj
Description
This function creates a kset structure dynamically and registers it with sysfs. When you are finished with this structure, call kset_unregister()
and the structure will be dynamically freed when it is no longer being used.
If the kset was not able to be created, NULL will be returned.
Kernel utility functions
-
return bits 32-63 of a number
upper_32_bits
(
n
)
Parameters
- the number we’re accessing
n
Description
A basic shift-right of a 64- or 32-bit quantity. Use this to suppress the “right shift count >= width of type” warning when that quantity is 32-bits.
-
return bits 0-31 of a number
lower_32_bits
(
n
)
Parameters
- the number we’re accessing
n
-
annotation for functions that can sleep
might_sleep
(
)
Parameters
Description
this macro will print a stack trace if it is executed in an atomic context (spinlock, irq-handler, ...).
This is a useful debugging help to be able to catch problems early and not be bitten later when the calling function happens to sleep when it is not supposed to.
-
return absolute value of an argument
abs
(
x
)
Parameters
- the value. If it is unsigned type, it is converted to signed type first. char is treated as if it was signed (regardless of whether it really is) but the macro’s return type is preserved as char.
x
Return
an absolute value of x.
-
u32
-
“scale” a value into range [0, ep_ro)
reciprocal_scale
(u32
val, u32
ep_ro
)
Parameters
- value
- right open interval endpoint
u32 val
u32 ep_ro
Description
Perform a “reciprocal multiplication” in order to “scale” a value into range [0, ep_ro), where the upper interval endpoint is right-open. This is useful, e.g. for accessing a index of an array containing ep_ro elements, for example. Think of it as sort of modulus, only that the result isn’t that of modulo. ;) Note that if initial input is a small value, then result will return 0.
Return
a result based on val in interval [0, ep_ro).
-
int
-
convert a string to an unsigned long
kstrtoul
(const char *
s, unsigned int
base, unsigned long *
res
)
Parameters
- The start of the string. The string must be null-terminated, and may also include a single newline before its terminating null. The first character may also be a plus sign, but not a minus sign.
- The number base to use. The maximum supported base is 16. If base is given as 0, then the base of the string is automatically detected with the conventional semantics - If it begins with 0x the number will be parsed as a hexadecimal (case insensitive), if it otherwise begins with 0, it will be parsed as an octal number. Otherwise it will be parsed as a decimal.
- Where to write the result of the conversion on success.
const char * s
unsigned int base
unsigned long * res
Description
Returns 0 on success, -ERANGE on overflow and -EINVAL on parsing error. Used as a replacement for the obsolete simple_strtoull. Return code must be checked.
-
int
-
convert a string to a long
kstrtol
(const char *
s, unsigned int
base, long *
res
)
Parameters
- The start of the string. The string must be null-terminated, and may also include a single newline before its terminating null. The first character may also be a plus sign or a minus sign.
- The number base to use. The maximum supported base is 16. If base is given as 0, then the base of the string is automatically detected with the conventional semantics - If it begins with 0x the number will be parsed as a hexadecimal (case insensitive), if it otherwise begins with 0, it will be parsed as an octal number. Otherwise it will be parsed as a decimal.
- Where to write the result of the conversion on success.
const char * s
unsigned int base
long * res
Description
Returns 0 on success, -ERANGE on overflow and -EINVAL on parsing error. Used as a replacement for the obsolete simple_strtoull. Return code must be checked.
-
printf formatting in the ftrace buffer
trace_printk
(
fmt,
...
)
Parameters
- the printf format for printing
- variable arguments
fmt
...
Note
-
__trace_printk is an internal function for trace_printk and
- the ip is passed in via the trace_printk macro.
This function allows a kernel developer to debug fast path sections that printk is not appropriate for. By scattering in various printk like tracing in the code, a developer can quickly see where problems are occurring.
This is intended as a debugging tool for the developer only. Please refrain from leaving trace_printks scattered around in your code. (Extra memory is used for special buffers that are allocated when trace_printk()
is used)
A little optization trick is done here. If there’s only one argument, there’s no need to scan the string for printf formats. The trace_puts()
will suffice. But how can we take advantage of using trace_puts()
when trace_printk()
has only one argument? By stringifying the args and checking the size we can tell whether or not there are args. __stringify((__VA_ARGS__)) will turn into “()0” with a size of 3 when there are no args, anything else will be bigger. All we need to do is define a string to this, and then take its size and compare to 3. If it’s bigger, use do_trace_printk()
otherwise, optimize it to trace_puts()
. Then just let gcc optimize the rest.
-
write a string into the ftrace buffer
trace_puts
(
str
)
Parameters
- the string to record
str
Note
-
__trace_bputs is an internal function for trace_puts and
- the ip is passed in via the trace_puts macro.
This is similar to trace_printk()
but is made for those really fast paths that a developer wants the least amount of “Heisenbug” affects, where the processing of the print format is still too much.
This function allows a kernel developer to debug fast path sections that printk is not appropriate for. By scattering in various printk like tracing in the code, a developer can quickly see where problems are occurring.
This is intended as a debugging tool for the developer only. Please refrain from leaving trace_puts scattered around in your code. (Extra memory is used for special buffers that are allocated when trace_puts()
is used)
Return
-
0 if nothing was written, positive # if string was.
- (1 when __trace_bputs is used, strlen(str) when __trace_puts is used)
-
return the minimum that is _not_ zero, unless both are zero
min_not_zero
(
x,
y
)
Parameters
- value1
- value2
x
y
-
return a value clamped to a given range with strict typechecking
clamp
(
val,
lo,
hi
)
Parameters
- current value
- lowest allowable value
- highest allowable value
val
lo
hi
Description
This macro does strict typechecking of lo/hi to make sure they are of the same type as val. See the unnecessary pointer comparisons.
-
return a value clamped to a given range using a given type
clamp_t
(
type,
val,
lo,
hi
)
Parameters
- the type of variable to use
- current value
- minimum allowable value
- maximum allowable value
type
val
lo
hi
Description
This macro does no typechecking and uses temporary variables of type ‘type’ to make all the comparisons.
-
return a value clamped to a given range using val’s type
clamp_val
(
val,
lo,
hi
)
Parameters
- current value
- minimum allowable value
- maximum allowable value
val
lo
hi
Description
This macro does no typechecking and uses temporary variables of whatever type the input argument ‘val’ is. This is useful when val is an unsigned type and min and max are literals that will otherwise be assigned a signed integer type.
-
cast a member of a structure out to the containing structure
container_of
(
ptr,
type,
member
)
Parameters
- the pointer to the member.
- the type of the container struct this is embedded in.
- the name of the member within the struct.
ptr
type
member
-
__visible int
-
print a kernel message
printk
(const char *
fmt, ...
)
Parameters
- format string
- variable arguments
const char * fmt
...
Description
This is printk()
. It can be called from any context. We want it to work.
We try to grab the console_lock. If we succeed, it’s easy - we log the output and call the console drivers. If we fail to get the semaphore, we place the output into the log buffer and return. The current holder of the console_sem will notice the new output in console_unlock()
; and will send it to the consoles before releasing the lock.
One effect of this deferred printing is that code which calls printk()
and then changes console_loglevel may break. This is because console_loglevel is inspected when the actual printing occurs.
See also: printf(3)
See the vsnprintf()
documentation for format string extensions over C99.
-
void
-
lock the console system for exclusive use.
console_lock
(void
)
Parameters
- no arguments
void
Description
Acquires a lock which guarantees that the caller has exclusive access to the console system and the console_drivers list.
Can sleep, returns nothing.
-
int
-
try to lock the console system for exclusive use.
console_trylock
(void
)
Parameters
- no arguments
void
Description
Try to acquire a lock which guarantees that the caller has exclusive access to the console system and the console_drivers list.
returns 1 on success, and 0 on failure to acquire the lock.
-
void
-
unlock the console system
console_unlock
(void
)
Parameters
- no arguments
void
Description
Releases the console_lock which the caller holds on the console system and the console driver list.
While the console_lock was held, console output may have been buffered by printk()
. If this is the case, console_unlock()
; emits the output prior to releasing the lock.
If there is output waiting, we wake /dev/kmsg and syslog()
users.
console_unlock()
; may be called from any context.
-
void __sched
-
yield the CPU if required
console_conditional_schedule
(void
)
Parameters
- no arguments
void
Description
If the console code is currently allowed to sleep, and if this CPU should yield the CPU to another task, do so here.
Must be called within console_lock()
;.
-
bool
-
caller-controlled printk ratelimiting
printk_timed_ratelimit
(unsigned long *
caller_jiffies, unsigned int
interval_msecs
)
Parameters
- pointer to caller’s state
- minimum interval between prints
unsigned long * caller_jiffies
unsigned int interval_msecs
Description
printk_timed_ratelimit()
returns true if more than interval_msecs milliseconds have elapsed since the last time printk_timed_ratelimit()
returned true.
-
int
-
register a kernel log dumper.
kmsg_dump_register
(struct kmsg_dumper *
dumper
)
Parameters
- pointer to the kmsg_dumper structure
struct kmsg_dumper * dumper
Description
Adds a kernel log dumper to the system. The dump callback in the structure will be called when the kernel oopses or panics and must be set. Returns zero on success and -EINVAL
or -EBUSY
otherwise.
-
int
-
unregister a kmsg dumper.
kmsg_dump_unregister
(struct kmsg_dumper *
dumper
)
Parameters
- pointer to the kmsg_dumper structure
struct kmsg_dumper * dumper
Description
Removes a dump device from the system. Returns zero on success and -EINVAL
otherwise.
-
bool
-
retrieve one kmsg log line
kmsg_dump_get_line
(struct kmsg_dumper *
dumper, bool
syslog, char *
line, size_t
size, size_t *
len
)
Parameters
- registered kmsg dumper
- include the “<4>” prefixes
- buffer to copy the line to
- maximum size of the buffer
- length of line placed into buffer
struct kmsg_dumper * dumper
bool syslog
char * line
size_t size
size_t * len
Description
Start at the beginning of the kmsg buffer, with the oldest kmsg record, and copy one record into the provided buffer.
Consecutive calls will return the next available record moving towards the end of the buffer with the youngest messages.
A return value of FALSE indicates that there are no more records to read.
-
bool
-
copy kmsg log lines
kmsg_dump_get_buffer
(struct kmsg_dumper *
dumper, bool
syslog, char *
buf, size_t
size, size_t *
len
)
Parameters
- registered kmsg dumper
- include the “<4>” prefixes
- buffer to copy the line to
- maximum size of the buffer
- length of line placed into buffer
struct kmsg_dumper * dumper
bool syslog
char * buf
size_t size
size_t * len
Description
Start at the end of the kmsg buffer and fill the provided buffer with as many of the the youngest kmsg records that fit into it. If the buffer is large enough, all available kmsg records will be copied with a single call.
Consecutive calls will fill the buffer with the next block of available older records, not including the earlier retrieved ones.
A return value of FALSE indicates that there are no more records to read.
-
void
-
reset the interator
kmsg_dump_rewind
(struct kmsg_dumper *
dumper
)
Parameters
- registered kmsg dumper
struct kmsg_dumper * dumper
Description
Reset the dumper’s iterator so that kmsg_dump_get_line()
and kmsg_dump_get_buffer()
can be called again and used multiple times within the same dumper.:c:func:dump() callback.
-
void
-
halt the system
panic
(const char *
fmt, ...
)
Parameters
- The text string to print
- variable arguments
const char * fmt
...
Description
Display a message, then perform cleanups.
This function never returns.
-
void
-
add_taint
(unsigned
flag, enum lockdep_ok
lockdep_ok
)
Parameters
- one of the TAINT_* constants.
- whether lock debugging is still OK.
unsigned flag
enum lockdep_ok lockdep_ok
Description
If something bad has gone wrong, you’ll want lockdebug_ok = false, but for some notewortht-but-not-corrupting cases, it can be set to true.
-
int
-
initialize a sleep-RCU structure
init_srcu_struct
(struct srcu_struct *
sp
)
Parameters
- structure to initialize.
struct srcu_struct * sp
Description
Must invoke this on a given srcu_struct before passing that srcu_struct to any other function. Each srcu_struct represents a separate domain of SRCU protection.
-
void
-
deconstruct a sleep-RCU structure
cleanup_srcu_struct
(struct srcu_struct *
sp
)
Parameters
- structure to clean up.
struct srcu_struct * sp
Description
Must invoke this after you are finished using a given srcu_struct that was initialized via init_srcu_struct()
, else you leak memory.
-
void
-
wait for prior SRCU read-side critical-section completion
synchronize_srcu
(struct srcu_struct *
sp
)
Parameters
- srcu_struct with which to synchronize.
struct srcu_struct * sp
Description
Wait for the count to drain to zero of both indexes. To avoid the possible starvation of synchronize_srcu()
, it waits for the count of the index=((->completed & 1) ^ 1) to drain to zero at first, and then flip the completed and wait for the count of the other index.
Can block; must be called from process context.
Note that it is illegal to call synchronize_srcu()
from the corresponding SRCU read-side critical section; doing so will result in deadlock. However, it is perfectly legal to call synchronize_srcu()
on one srcu_struct from some other srcu_struct’s read-side critical section, as long as the resulting graph of srcu_structs is acyclic.
There are memory-ordering constraints implied by synchronize_srcu()
. On systems with more than one CPU, when synchronize_srcu()
returns, each CPU is guaranteed to have executed a full memory barrier since the end of its last corresponding SRCU-sched read-side critical section whose beginning preceded the call to synchronize_srcu()
. In addition, each CPU having an SRCU read-side critical section that extends beyond the return from synchronize_srcu()
is guaranteed to have executed a full memory barrier after the beginning of synchronize_srcu()
and before the beginning of that SRCU read-side critical section. Note that these guarantees include CPUs that are offline, idle, or executing in user mode, as well as CPUs that are executing in the kernel.
Furthermore, if CPU A invoked synchronize_srcu()
, which returned to its caller on CPU B, then both CPU A and CPU B are guaranteed to have executed a full memory barrier during the execution of synchronize_srcu()
. This guarantee applies even if CPU A and CPU B are the same CPU, but again only if the system has more than one CPU.
Of course, these memory-ordering guarantees apply only when synchronize_srcu()
, srcu_read_lock()
, and srcu_read_unlock()
are passed the same srcu_struct structure.
-
void
-
Brute-force SRCU grace period
synchronize_srcu_expedited
(struct srcu_struct *
sp
)
Parameters
- srcu_struct with which to synchronize.
struct srcu_struct * sp
Description
Wait for an SRCU grace period to elapse, but be more aggressive about spinning rather than blocking when waiting.
Note that synchronize_srcu_expedited()
has the same deadlock and memory-ordering properties as does synchronize_srcu()
.
-
void
-
Wait until all in-flight
call_srcu()
callbacks complete.
srcu_barrier
(struct srcu_struct *
sp
)
Parameters
- srcu_struct on which to wait for in-flight callbacks.
struct srcu_struct * sp
-
unsigned long
-
return batches completed.
srcu_batches_completed
(struct srcu_struct *
sp
)
Parameters
- srcu_struct on which to report batch completion.
struct srcu_struct * sp
Description
Report the number of batches, correlated with, but not necessarily precisely the same as, the number of grace periods that have elapsed.
-
void
-
inform RCU that current CPU is entering idle
rcu_idle_enter
(void
)
Parameters
- no arguments
void
Description
Enter idle mode, in other words, -leave- the mode in which RCU read-side critical sections can occur. (Though RCU read-side critical sections can occur in irq handlers in idle, a possibility handled by irq_enter()
and irq_exit()
.)
We crowbar the ->dynticks_nesting field to zero to allow for the possibility of usermode upcalls having messed up our count of interrupt nesting level during the prior busy period.
-
void
-
inform RCU that current CPU is leaving idle
rcu_idle_exit
(void
)
Parameters
- no arguments
void
Description
Exit idle mode, in other words, -enter- the mode in which RCU read-side critical sections can occur.
We crowbar the ->dynticks_nesting field to DYNTICK_TASK_NEST to allow for the possibility of usermode upcalls messing up our count of interrupt nesting level during the busy period that is just now starting.
-
bool notrace
-
see if RCU thinks that the current CPU is idle
rcu_is_watching
(void
)
Parameters
- no arguments
void
Description
If the current CPU is in its idle loop and is neither in an interrupt or NMI handler, return true.
-
void
-
wait until an rcu-sched grace period has elapsed.
synchronize_sched
(void
)
Parameters
- no arguments
void
Description
Control will return to the caller some time after a full rcu-sched grace period has elapsed, in other words after all currently executing rcu-sched read-side critical sections have completed. These read-side critical sections are delimited by rcu_read_lock_sched()
and rcu_read_unlock_sched()
, and may be nested. Note that preempt_disable()
, local_irq_disable()
, and so on may be used in place of rcu_read_lock_sched()
.
This means that all preempt_disable code sequences, including NMI and non-threaded hardware-interrupt handlers, in progress on entry will have completed before this primitive returns. However, this does not guarantee that softirq handlers will have completed, since in some kernels, these handlers can run in process context, and can block.
Note that this guarantee implies further memory-ordering guarantees. On systems with more than one CPU, when synchronize_sched()
returns, each CPU is guaranteed to have executed a full memory barrier since the end of its last RCU-sched read-side critical section whose beginning preceded the call to synchronize_sched()
. In addition, each CPU having an RCU read-side critical section that extends beyond the return fromsynchronize_sched()
is guaranteed to have executed a full memory barrier after the beginning of synchronize_sched()
and before the beginning of that RCU read-side critical section. Note that these guarantees include CPUs that are offline, idle, or executing in user mode, as well as CPUs that are executing in the kernel.
Furthermore, if CPU A invoked synchronize_sched()
, which returned to its caller on CPU B, then both CPU A and CPU B are guaranteed to have executed a full memory barrier during the execution of synchronize_sched()
– even if CPU A and CPU B are the same CPU (but again only if the system has more than one CPU).
This primitive provides the guarantees made by the (now removed) synchronize_kernel()
API. In contrast, synchronize_rcu()
only guarantees thatrcu_read_lock()
sections will have completed. In “classic RCU”, these two guarantees happen to be one and the same, but can differ in realtime RCU implementations.
-
void
-
wait until an rcu_bh grace period has elapsed.
synchronize_rcu_bh
(void
)
Parameters
- no arguments
void
Description
Control will return to the caller some time after a full rcu_bh grace period has elapsed, in other words after all currently executing rcu_bh read-side critical sections have completed. RCU read-side critical sections are delimited by rcu_read_lock_bh()
and rcu_read_unlock_bh()
, and may be nested.
See the description of synchronize_sched()
for more detailed information on memory ordering guarantees.
-
unsigned long
-
Snapshot current RCU state
get_state_synchronize_rcu
(void
)
Parameters
- no arguments
void
Description
Returns a cookie that is used by a later call to cond_synchronize_rcu()
to determine whether or not a full grace period has elapsed in the meantime.
-
void
-
Conditionally wait for an RCU grace period
cond_synchronize_rcu
(unsigned long
oldstate
)
Parameters
-
return value from earlier call to
get_state_synchronize_rcu()
unsigned long oldstate
Description
If a full RCU grace period has elapsed since the earlier call to get_state_synchronize_rcu()
, just return. Otherwise, invoke synchronize_rcu()
to wait for a full grace period.
Yes, this function does not take counter wrap into account. But counter wrap is harmless. If the counter wraps, we have waited for more than 2 billion grace periods (and way more on a 64-bit system!), so waiting for one additional grace period should be just fine.
-
unsigned long
-
Snapshot current RCU-sched state
get_state_synchronize_sched
(void
)
Parameters
- no arguments
void
Description
Returns a cookie that is used by a later call to cond_synchronize_sched()
to determine whether or not a full grace period has elapsed in the meantime.
-
void
-
Conditionally wait for an RCU-sched grace period
cond_synchronize_sched
(unsigned long
oldstate
)
Parameters
-
return value from earlier call to
get_state_synchronize_sched()
unsigned long oldstate
Description
If a full RCU-sched grace period has elapsed since the earlier call to get_state_synchronize_sched()
, just return. Otherwise, invokesynchronize_sched()
to wait for a full grace period.
Yes, this function does not take counter wrap into account. But counter wrap is harmless. If the counter wraps, we have waited for more than 2 billion grace periods (and way more on a 64-bit system!), so waiting for one additional grace period should be just fine.
-
void
-
Wait until all in-flight
call_rcu_bh()
callbacks complete.
rcu_barrier_bh
(void
)
Parameters
- no arguments
void
-
void
-
Wait for in-flight
call_rcu_sched()
callbacks.
rcu_barrier_sched
(void
)
Parameters
- no arguments
void
-
void
-
wait until a grace period has elapsed.
synchronize_rcu
(void
)
Parameters
- no arguments
void
Description
Control will return to the caller some time after a full grace period has elapsed, in other words after all currently executing RCU read-side critical sections have completed. Note, however, that upon return from synchronize_rcu()
, the caller might well be executing concurrently with new RCU read-side critical sections that began while synchronize_rcu()
was waiting. RCU read-side critical sections are delimited by rcu_read_lock()
andrcu_read_unlock()
, and may be nested.
See the description of synchronize_sched()
for more detailed information on memory ordering guarantees.
-
void
-
Wait until all in-flight
call_rcu()
callbacks complete.
rcu_barrier
(void
)
Parameters
- no arguments
void
Description
Note that this primitive does not necessarily wait for an RCU grace period to complete. For example, if there are no RCU callbacks queued anywhere in the system, then rcu_barrier()
is within its rights to return immediately, without waiting for anything, much less an RCU grace period.
-
int
-
might we be in RCU-sched read-side critical section?
rcu_read_lock_sched_held
(void
)
Parameters
- no arguments
void
Description
If CONFIG_DEBUG_LOCK_ALLOC is selected, returns nonzero iff in an RCU-sched read-side critical section. In absence of CONFIG_DEBUG_LOCK_ALLOC, this assumes we are in an RCU-sched read-side critical section unless it can prove otherwise. Note that disabling of preemption (including disabling irqs) counts as an RCU-sched read-side critical section. This is useful for debug checks in functions that required that they be called within an RCU-sched read-side critical section.
Check debug_lockdep_rcu_enabled()
to prevent false positives during boot and while lockdep is disabled.
Note that if the CPU is in the idle loop from an RCU point of view (ie: that we are in the section between rcu_idle_enter()
and rcu_idle_exit()
) then rcu_read_lock_held()
returns false even if the CPU did an rcu_read_lock()
. The reason for this is that RCU ignores CPUs that are in such a section, considering these as in extended quiescent state, so such a CPU is effectively never in an RCU read-side critical section regardless of what RCU primitives it invokes. This state of affairs is required — we need to keep an RCU-free window in idle where the CPU may possibly enter into low power mode. This way we can notice an extended quiescent state to other CPUs that started a grace period. Otherwise we would delay any grace period as long as we run in the idle task.
Similarly, we avoid claiming an SRCU read lock held if the current CPU is offline.
-
void
-
Expedite future RCU grace periods
rcu_expedite_gp
(void
)
Parameters
- no arguments
void
Description
After a call to this function, future calls to synchronize_rcu()
and friends act as the corresponding synchronize_rcu_expedited()
function had instead been called.
-
void
-
Cancel prior
rcu_expedite_gp()
invocation
rcu_unexpedite_gp
(void
)
Parameters
- no arguments
void
Description
Undo a prior call to rcu_expedite_gp()
. If all prior calls to rcu_expedite_gp()
are undone by a subsequent call to rcu_unexpedite_gp()
, and if the rcu_expedited sysfs/boot parameter is not set, then all subsequent calls to synchronize_rcu()
and friends will return to their normal non-expedited behavior.
-
int
-
might we be in RCU read-side critical section?
rcu_read_lock_held
(void
)
Parameters
- no arguments
void
Description
If CONFIG_DEBUG_LOCK_ALLOC is selected, returns nonzero iff in an RCU read-side critical section. In absence of CONFIG_DEBUG_LOCK_ALLOC, this assumes we are in an RCU read-side critical section unless it can prove otherwise. This is useful for debug checks in functions that require that they be called within an RCU read-side critical section.
Checks debug_lockdep_rcu_enabled()
to prevent false positives during boot and while lockdep is disabled.
Note that rcu_read_lock()
and the matching rcu_read_unlock()
must occur in the same context, for example, it is illegal to invokercu_read_unlock()
in process context if the matching rcu_read_lock()
was invoked from within an irq handler.
Note that rcu_read_lock()
is disallowed if the CPU is either idle or offline from an RCU perspective, so check for those as well.
-
int
-
might we be in RCU-bh read-side critical section?
rcu_read_lock_bh_held
(void
)
Parameters
- no arguments
void
Description
Check for bottom half being disabled, which covers both the CONFIG_PROVE_RCU and not cases. Note that if someone uses rcu_read_lock_bh()
, but then later enables BH, lockdep (if enabled) will show the situation. This is useful for debug checks in functions that require that they be called within an RCU read-side critical section.
Check debug_lockdep_rcu_enabled()
to prevent false positives during boot.
Note that rcu_read_lock()
is disallowed if the CPU is either idle or offline from an RCU perspective, so check for those as well.
-
void
-
Callback function to awaken a task after grace period
wakeme_after_rcu
(struct rcu_head *
head
)
Parameters
- Pointer to rcu_head member within rcu_synchronize structure
struct rcu_head * head
Description
Awaken the corresponding task now that a grace period has elapsed.
-
void
-
initialize on-stack rcu_head for debugobjects
init_rcu_head_on_stack
(struct rcu_head *
head
)
Parameters
- pointer to rcu_head structure to be initialized
struct rcu_head * head
Description
This function informs debugobjects of a new rcu_head structure that has been allocated as an auto variable on the stack. This function is not required for rcu_head structures that are statically defined or that are dynamically allocated on the heap. This function has no effect for !CONFIG_DEBUG_OBJECTS_RCU_HEAD kernel builds.
-
void
-
destroy on-stack rcu_head for debugobjects
destroy_rcu_head_on_stack
(struct rcu_head *
head
)
Parameters
- pointer to rcu_head structure to be initialized
struct rcu_head * head
Description
This function informs debugobjects that an on-stack rcu_head structure is about to go out of scope. As with init_rcu_head_on_stack()
, this function is not required for rcu_head structures that are statically defined or that are dynamically allocated on the heap. Also as withinit_rcu_head_on_stack()
, this function has no effect for !CONFIG_DEBUG_OBJECTS_RCU_HEAD kernel builds.
-
void
-
wait until an rcu-tasks grace period has elapsed.
synchronize_rcu_tasks
(void
)
Parameters
- no arguments
void
Description
Control will return to the caller some time after a full rcu-tasks grace period has elapsed, in other words after all currently executing rcu-tasks read-side critical sections have elapsed. These read-side critical sections are delimited by calls to schedule()
, cond_resched_rcu_qs()
, idle execution, userspace execution, calls to synchronize_rcu_tasks()
, and (in theory, anyway) cond_resched()
.
This is a very specialized primitive, intended only for a few uses in tracing and other situations requiring manipulation of function preambles and profiling hooks. The synchronize_rcu_tasks()
function is not (yet) intended for heavy use from multiple CPUs.
Note that this guarantee implies further memory-ordering guarantees. On systems with more than one CPU, when synchronize_rcu_tasks()
returns, each CPU is guaranteed to have executed a full memory barrier since the end of its last RCU-tasks read-side critical section whose beginning preceded the call to synchronize_rcu_tasks()
. In addition, each CPU having an RCU-tasks read-side critical section that extends beyond the return from synchronize_rcu_tasks()
is guaranteed to have executed a full memory barrier after the beginning of synchronize_rcu_tasks()
and before the beginning of that RCU-tasks read-side critical section. Note that these guarantees include CPUs that are offline, idle, or executing in user mode, as well as CPUs that are executing in the kernel.
Furthermore, if CPU A invoked synchronize_rcu_tasks()
, which returned to its caller on CPU B, then both CPU A and CPU B are guaranteed to have executed a full memory barrier during the execution of synchronize_rcu_tasks()
– even if CPU A and CPU B are the same CPU (but again only if the system has more than one CPU).
-
void
-
Wait for in-flight
call_rcu_tasks()
callbacks.
rcu_barrier_tasks
(void
)
Parameters
- no arguments
void
Description
Although the current implementation is guaranteed to wait, it is not obligated to, for example, if there are no pending callbacks.
Device Resource Management
-
void *
-
Allocate device resource data
devres_alloc_node
(dr_release_t
release, size_t
size, gfp_t
gfp, int
nid
)
Parameters
- Release function devres will be associated with
- Allocation size
- Allocation flags
- NUMA node
dr_release_t release
size_t size
gfp_t gfp
int nid
Description
Allocate devres of size bytes. The allocated area is zeroed, then associated with release. The returned pointer can be passed to other devres_*() functions.
Return
Pointer to allocated devres on success, NULL on failure.
-
void
-
Resource iterator
devres_for_each_res
(struct
device
*
dev, dr_release_t
release, dr_match_t
match, void *
match_data, void (*fn) (struct
device
*, void
*, void
*, void *
data
)
Parameters
- Device to iterate resource from
- Look for resources associated with this release function
- Match function (optional)
- Data for the match function
- Function to be called for each matched resource.
- Data for fn, the 3rd parameter of fn
struct device * dev
dr_release_t release
dr_match_t match
void * match_data
void (*)(struct device *, void *, void *) fn
void * data
Description
Call fn for each devres of dev which is associated with release and for which match returns 1.
Return
void
-
void
-
Free device resource data
devres_free
(void *
res
)
Parameters
- Pointer to devres data to free
void * res
Description
Free devres created with devres_alloc()
.
-
void
-
Register device resource
devres_add
(struct
device
*
dev, void *
res
)
Parameters
- Device to add resource to
- Resource to register
struct device * dev
void * res
Description
Register devres res to dev. res should have been allocated using devres_alloc()
. On driver detach, the associated release function will be invoked and devres will be freed automatically.
-
void *
-
Find device resource
devres_find
(struct
device
*
dev, dr_release_t
release, dr_match_t
match, void *
match_data
)
Parameters
- Device to lookup resource from
- Look for resources associated with this release function
- Match function (optional)
- Data for the match function
struct device * dev
dr_release_t release
dr_match_t match
void * match_data
Description
Find the latest devres of dev which is associated with release and for which match returns 1. If match is NULL, it’s considered to match all.
Return
Pointer to found devres, NULL if not found.
-
void *
-
Find devres, if non-existent, add one atomically
devres_get
(struct
device
*
dev, void *
new_res, dr_match_t
match, void *
match_data
)
Parameters
- Device to lookup or add devres for
- Pointer to new initialized devres to add if not found
- Match function (optional)
- Data for the match function
struct device * dev
void * new_res
dr_match_t match
void * match_data
Description
Find the latest devres of dev which has the same release function as new_res and for which match return 1. If found, new_res is freed; otherwise,new_res is added atomically.
Return
Pointer to found or added devres.
-
void *
-
Find a device resource and remove it
devres_remove
(struct
device
*
dev, dr_release_t
release, dr_match_t
match, void *
match_data
)
Parameters
- Device to find resource from
- Look for resources associated with this release function
- Match function (optional)
- Data for the match function
struct device * dev
dr_release_t release
dr_match_t match
void * match_data
Description
Find the latest devres of dev associated with release and for which match returns 1. If match is NULL, it’s considered to match all. If found, the resource is removed atomically and returned.
Return
Pointer to removed devres on success, NULL if not found.
-
int
-
Find a device resource and destroy it
devres_destroy
(struct
device
*
dev, dr_release_t
release, dr_match_t
match, void *
match_data
)
Parameters
- Device to find resource from
- Look for resources associated with this release function
- Match function (optional)
- Data for the match function
struct device * dev
dr_release_t release
dr_match_t match
void * match_data
Description
Find the latest devres of dev associated with release and for which match returns 1. If match is NULL, it’s considered to match all. If found, the resource is removed atomically and freed.
Note that the release function for the resource will not be called, only the devres-allocated data will be freed. The caller becomes responsible for freeing any other data.
Return
0 if devres is found and freed, -ENOENT if not found.
-
int
-
Find a device resource and destroy it, calling release
devres_release
(struct
device
*
dev, dr_release_t
release, dr_match_t
match, void *
match_data
)
Parameters
- Device to find resource from
- Look for resources associated with this release function
- Match function (optional)
- Data for the match function
struct device * dev
dr_release_t release
dr_match_t match
void * match_data
Description
Find the latest devres of dev associated with release and for which match returns 1. If match is NULL, it’s considered to match all. If found, the resource is removed atomically, the release function called and the resource freed.
Return
0 if devres is found and freed, -ENOENT if not found.
-
void *
-
Open a new devres group
devres_open_group
(struct
device
*
dev, void *
id, gfp_t
gfp
)
Parameters
- Device to open devres group for
- Separator ID
- Allocation flags
struct device * dev
void * id
gfp_t gfp
Description
Open a new devres group for dev with id. For id, using a pointer to an object which won’t be used for another group is recommended. If id is NULL, address-wise unique ID is created.
Return
ID of the new group, NULL on failure.
-
void
-
Close a devres group
devres_close_group
(struct
device
*
dev, void *
id
)
Parameters
- Device to close devres group for
- ID of target group, can be NULL
struct device * dev
void * id
Description
Close the group identified by id. If id is NULL, the latest open group is selected.
-
void
-
Remove a devres group
devres_remove_group
(struct
device
*
dev, void *
id
)
Parameters
- Device to remove group for
- ID of target group, can be NULL
struct device * dev
void * id
Description
Remove the group identified by id. If id is NULL, the latest open group is selected. Note that removing a group doesn’t affect any other resources.
-
int
-
Release resources in a devres group
devres_release_group
(struct
device
*
dev, void *
id
)
Parameters
- Device to release group for
- ID of target group, can be NULL
struct device * dev
void * id
Description
Release all resources in the group identified by id. If id is NULL, the latest open group is selected. The selected group and groups properly nested inside the selected group are removed.
Return
The number of released non-group resources.
-
int
-
add a custom action to list of managed resources
devm_add_action
(struct
device
*
dev, void (*action) (void
*, void *
data
)
Parameters
- Device that owns the action
- Function that should be called
- Pointer to data passed to action implementation
struct device * dev
void (*)(void *) action
void * data
Description
This adds a custom action to the list of managed resources so that it gets executed as part of standard resource unwinding.
-
void
-
removes previously added custom action
devm_remove_action
(struct
device
*
dev, void (*action) (void
*, void *
data
)
Parameters
- Device that owns the action
- Function implementing the action
- Pointer to data passed to action implementation
struct device * dev
void (*)(void *) action
void * data
Description
Removes instance of action previously added by devm_add_action()
. Both action and data should match one of the existing entries.
-
void *
-
Resource-managed kmalloc
devm_kmalloc
(struct
device
*
dev, size_t
size, gfp_t
gfp
)
Parameters
- Device to allocate memory for
- Allocation size
- Allocation gfp flags
struct device * dev
size_t size
gfp_t gfp
Description
Managed kmalloc. Memory allocated with this function is automatically freed on driver detach. Like all other devres resources, guaranteed alignment is unsigned long long.
Return
Pointer to allocated memory on success, NULL on failure.
-
char *
-
Allocate resource managed space and copy an existing string into that.
devm_kstrdup
(struct
device
*
dev, const char *
s, gfp_t
gfp
)
Parameters
- Device to allocate memory for
- the string to duplicate
-
the GFP mask used in the
devm_kmalloc()
call when allocating memory
struct device * dev
const char * s
gfp_t gfp
Return
Pointer to allocated string on success, NULL on failure.
-
char *
-
Allocate resource managed space and format a string into that.
devm_kvasprintf
(struct
device
*
dev, gfp_t
gfp, const char *
fmt, va_list
ap
)
Parameters
- Device to allocate memory for
-
the GFP mask used in the
devm_kmalloc()
call when allocating memory -
The
printf()
-style format string - Arguments for the format string
struct device * dev
gfp_t gfp
const char * fmt
va_list ap
Return
Pointer to allocated string on success, NULL on failure.
-
char *
-
Allocate resource managed space and format a string into that.
devm_kasprintf
(struct
device
*
dev, gfp_t
gfp, const char *
fmt, ...
)
Parameters
- Device to allocate memory for
-
the GFP mask used in the
devm_kmalloc()
call when allocating memory -
The
printf()
-style format string - Arguments for the format string
struct device * dev
gfp_t gfp
const char * fmt
...
Return
Pointer to allocated string on success, NULL on failure.
-
void
-
Resource-managed kfree
devm_kfree
(struct
device
*
dev, void *
p
)
Parameters
- Device this memory belongs to
- Memory to free
struct device * dev
void * p
Description
Free memory allocated with devm_kmalloc()
.
-
void *
-
Resource-managed kmemdup
devm_kmemdup
(struct
device
*
dev, const void *
src, size_t
len, gfp_t
gfp
)
Parameters
- Device this memory belongs to
- Memory region to duplicate
- Memory region length
- GFP mask to use
struct device * dev
const void * src
size_t len
gfp_t gfp
Description
Duplicate region of a memory using resource managed kmalloc
-
unsigned long
-
Resource-managed __get_free_pages
devm_get_free_pages
(struct
device
*
dev, gfp_t
gfp_mask, unsigned int
order
)
Parameters
- Device to allocate memory for
- Allocation gfp flags
- Allocation size is (1 << order) pages
struct device * dev
gfp_t gfp_mask
unsigned int order
Description
Managed get_free_pages. Memory allocated with this function is automatically freed on driver detach.
Return
Address of allocated memory on success, 0 on failure.
-
void
-
Resource-managed free_pages
devm_free_pages
(struct
device
*
dev, unsigned long
addr
)
Parameters
- Device this memory belongs to
- Memory to free
struct device * dev
unsigned long addr
Description
Free memory allocated with devm_get_free_pages()
. Unlike free_pages, there is no need to supply the order.
-
void __percpu *
-
Resource-managed alloc_percpu
__devm_alloc_percpu
(struct
device
*
dev, size_t
size, size_t
align
)
Parameters
- Device to allocate per-cpu memory for
- Size of per-cpu memory to allocate
- Alignment of per-cpu memory to allocate
struct device * dev
size_t size
size_t align
Description
Managed alloc_percpu. Per-cpu memory allocated with this function is automatically freed on driver detach.
Return
Pointer to allocated memory on success, NULL on failure.
-
void
-
Resource-managed free_percpu
devm_free_percpu
(struct
device
*
dev, void __percpu *
pdata
)
Parameters
- Device this memory belongs to
- Per-cpu memory to free
struct device * dev
void __percpu * pdata
Description
Free memory allocated with devm_alloc_percpu()
.