问题
在 likely/unlikely 的方案中可以学习到为了将指令执行流的效率更高效编译器的开发者通过分析CPU和指令流的特性从数据的cache和指令预执行两个角度下对代码的执行在大多数场景下进行优化,那是否还有进一步的优化空间呢 ?在大牛们的不懈努力下还真找到了一个方案。
原理
查看以下代码的逻辑:
在所有情况下需要执行条件指令
cmpl xxxx //从上图可以看到该指令有五个字节
jg yyyyyy //从上图可以看到该指令有两个字节
与likely/unlikely 类似当条件语句在大多数场景为某一固定值只需将以上指令换做更高效的指令即可有更好的效率,对此在大牛们提出了一下一个方案:
/*
* Jump label support
*
* Copyright (C) 2009-2012 Jason Baron <jbaron@redhat.com>
* Copyright (C) 2011-2012 Red Hat, Inc., Peter Zijlstra
*
* DEPRECATED API:
*
* The use of 'struct static_key' directly, is now DEPRECATED. In addition
* static_key_{true,false}() is also DEPRECATED. IE DO NOT use the following:
*
* struct static_key false = STATIC_KEY_INIT_FALSE;
* struct static_key true = STATIC_KEY_INIT_TRUE;
* static_key_true()
* static_key_false()
*
* The updated API replacements are:
*
* DEFINE_STATIC_KEY_TRUE(key);
* DEFINE_STATIC_KEY_FALSE(key);
* DEFINE_STATIC_KEY_ARRAY_TRUE(keys, count);
* DEFINE_STATIC_KEY_ARRAY_FALSE(keys, count);
* static_branch_likely()
* static_branch_unlikely()
*
* Jump labels provide an interface to generate dynamic branches using
* self-modifying code. Assuming toolchain and architecture support, if we
* define a "key" that is initially false via "DEFINE_STATIC_KEY_FALSE(key)",
* an "if (static_branch_unlikely(&key))" statement is an unconditional branch
* (which defaults to false - and the true block is placed out of line).
* Similarly, we can define an initially true key via
* "DEFINE_STATIC_KEY_TRUE(key)", and use it in the same
* "if (static_branch_unlikely(&key))", in which case we will generate an
* unconditional branch to the out-of-line true branch. Keys that are
* initially true or false can be using in both static_branch_unlikely()
* and static_branch_likely() statements.
*
* At runtime we can change the branch target by setting the key
* to true via a call to static_branch_enable(), or false using
* static_branch_disable(). If the direction of the branch is switched by
* these calls then we run-time modify the branch target via a
* no-op -> jump or jump -> no-op conversion. For example, for an
* initially false key that is used in an "if (static_branch_unlikely(&key))"
* statement, setting the key to true requires us to patch in a jump
* to the out-of-line of true branch.
*
* In addition to static_branch_{enable,disable}, we can also reference count
* the key or branch direction via static_branch_{inc,dec}. Thus,
* static_branch_inc() can be thought of as a 'make more true' and
* static_branch_dec() as a 'make more false'.
*
* Since this relies on modifying code, the branch modifying functions
* must be considered absolute slow paths (machine wide synchronization etc.).
* OTOH, since the affected branches are unconditional, their runtime overhead
* will be absolutely minimal, esp. in the default (off) case where the total
* effect is a single NOP of appropriate size. The on case will patch in a jump
* to the out-of-line block.
*
* When the control is directly exposed to userspace, it is prudent to delay the
* decrement to avoid high frequency code modifications which can (and do)
* cause significant performance degradation. Struct static_key_deferred and
* static_key_slow_dec_deferred() provide for this.
*
* Lacking toolchain and or architecture support, static keys fall back to a
* simple conditional branch.
*
* Additional babbling in: Documentation/staging/static-keys.rst
*/
从以上comment中可以总结如下:
- 存在一种机制可以在运行时刻更改代码的逻辑
- no-op的效率比其他指令高且简单
- 在需要的时候可以使用第一条进行no-op与jmp的互相切换
数据结构
struct static_key {
atomic_t enabled;
#ifdef CONFIG_JUMP_LABEL
/*
* Note:
* To make anonymous unions work with old compilers, the static
* initialization of them requires brackets. This creates a dependency
* on the order of the struct with the initializers. If any fields
* are added, STATIC_KEY_INIT_TRUE and STATIC_KEY_INIT_FALSE may need
* to be modified.
*
* bit 0 => 1 if key is initially true //通过使用最低为表示初始值
* 0 if initially false
* bit 1 => 1 if points to struct static_key_mod
* 0 if points to struct jump_entry //最低第二位用于指示存储的数据结构类型,该位区别模块和内核
*/
union {
unsigned long type; //低二位做为特殊标志使用,
struct jump_entry *entries;
struct static_key_mod *next;
};
#endif /* CONFIG_JUMP_LABEL */
};
struct jump_entry {
s32 code; //指令地址
s32 target; //指令的地址
long key; //指向static_key 的地址,地二位与type一样做为特殊标志使用,也就是说static_key 需要四字节对齐
};
实现逻辑
/*
* Combine the right initial value (type) with the right branch order
* to generate the desired result.
*
*
* type\branch| likely (1) | unlikely (0)
* -----------+-----------------------+------------------
* | |
* true (1) | ... | ...
* | NOP | JMP L
* | <br-stmts> | 1: ...
* | L: ... |
* | |
* | | L: <br-stmts>
* | | jmp 1b
* | |
* -----------+-----------------------+------------------
* | |
* false (0) | ... | ...
* | JMP L | NOP
* | <br-stmts> | 1: ...
* | L: ... |
* | |
* | | L: <br-stmts>
* | | jmp 1b
* | |
* -----------+-----------------------+------------------
*
* The initial value is encoded in the LSB of static_key::entries,
* type: 0 = false, 1 = true.
*
* The branch type is encoded in the LSB of jump_entry::key,
* branch: 0 = unlikely, 1 = likely.
*
* This gives the following logic table:
*
* enabled type branch instuction
* -----------------------------+-----------
* 0 0 0 | NOP
* 0 0 1 | JMP
* 0 1 0 | NOP
* 0 1 1 | JMP
*
* 1 0 0 | JMP
* 1 0 1 | NOP
* 1 1 0 | JMP
* 1 1 1 | NOP
*
* Which gives the following functions:
*
* dynamic: instruction = enabled ^ branch
* static: instruction = type ^ branch
*
* See jump_label_type() / jump_label_init_type().
*/
#define static_branch_likely(x) \
({ \
bool branch; \
if (__builtin_types_compatible_p(typeof(*x), struct static_key_true)) \
branch = !arch_static_branch(&(x)->key, true); \
else if (__builtin_types_compatible_p(typeof(*x), struct static_key_false)) \
branch = !arch_static_branch_jump(&(x)->key, true); \
else \
branch = ____wrong_branch_error(); \
likely_notrace(branch); \
})
#define static_branch_unlikely(x) \
({ \
bool branch; \
if (__builtin_types_compatible_p(typeof(*x), struct static_key_true)) \
branch = arch_static_branch_jump(&(x)->key, false); \
else if (__builtin_types_compatible_p(typeof(*x), struct static_key_false)) \
branch = arch_static_branch(&(x)->key, false); \
else \
branch = ____wrong_branch_error(); \
unlikely_notrace(branch); \
})
比如arm64架构:
//强制以内联方式存在,避免函数调用方式的上下文切换
static __always_inline bool arch_static_branch(struct static_key * const key,
const bool branch)
{
asm_volatile_goto(
"1: nop \n\t" //建立一个名为1的lable,该lable位置的指令为no-op操作
" .pushsection __jump_table, \"aw\" \n\t" //进入_jump_table section
" .align 3 \n\t"
" .long 1b - ., %l[l_yes] - . \n\t" //保存 lable 1和l_yes 的地址
" .quad %c0 - . \n\t" //以字符指针的方式保存static_key的地址,依据分支信息决定指针是否加1
" .popsection \n\t" //退出_jump_table section
: : "i"(&((char *)key)[branch]) : : l_yes);
return false;
l_yes:
return true;
}
static __always_inline bool arch_static_branch_jump(struct static_key * const key,
const bool branch)
{
asm_volatile_goto(
"1: b %l[l_yes] \n\t" //建立一个名为1的lable,该lable位置的指令为跳转至名为l_yes的地址
" .pushsection __jump_table, \"aw\" \n\t"
" .align 3 \n\t"
" .long 1b - ., %l[l_yes] - . \n\t"
" .quad %c0 - . \n\t"
" .popsection \n\t"
: : "i"(&((char *)key)[branch]) : : l_yes);
return false;
l_yes:
return true;
}
可以发现其实在__jump_table段存储的信息其实为 jump_entry对象,由于static_key地址四字节对齐,通过(&((char *)key)[branch])将jump_entry中key属性最低为置0或1,对应上面的btanch信息
初始化
以内核代码为例
void __init jump_label_init(void)
{
struct jump_entry *iter_start = __start___jump_table; //section起始位置
struct jump_entry *iter_stop = __stop___jump_table; //section末尾位置
struct static_key *key = NULL;
struct jump_entry *iter;
/*
* Since we are initializing the static_key.enabled field with
* with the 'raw' int values (to avoid pulling in atomic.h) in
* jump_label.h, let's make sure that is safe. There are only two
* cases to check since we initialize to 0 or 1.
*/
BUILD_BUG_ON((int)ATOMIC_INIT(0) != 0);
BUILD_BUG_ON((int)ATOMIC_INIT(1) != 1);
if (static_key_initialized) //确认没有被初始化过
return;
cpus_read_lock();
jump_label_lock();
jump_label_sort_entries(iter_start, iter_stop); //以static_key地址排序
for (iter = iter_start; iter < iter_stop; iter++) {
struct static_key *iterk;
bool in_init;
/* rewrite NOPs */
if (jump_label_type(iter) == JUMP_LABEL_NOP)
arch_jump_label_transform_static(iter, JUMP_LABEL_NOP);//通过架构相关函数写入空指令,多数情况下为空
in_init = init_section_contains((void *)jump_entry_code(iter), 1); //判断代码是否在内核中
jump_entry_set_init(iter, in_init); //设置jump_entry中key属性的最低第二位,表示是否已经初始化
iterk = jump_entry_key(iter);
if (iterk == key) //将static_key与jump_entry绑定,只需绑定一个jump_entry
continue;
key = iterk;
static_key_set_entries(key, iter); //将jump_entry的地址写入static_key的entries属性中
}
static_key_initialized = true; //初始化完成
jump_label_unlock();
cpus_read_unlock();
}
/***
* A 'struct static_key' uses a union such that it either points directly
* to a table of 'struct jump_entry' or to a linked list of modules which in
* turn point to 'struct jump_entry' tables.
*
* The two lower bits of the pointer are used to keep track of which pointer
* type is in use and to store the initial branch direction, we use an access
* function which preserves these bits.
*/
static void static_key_set_entries(struct static_key *key,
struct jump_entry *entries)
{
unsigned long type;
WARN_ON_ONCE((unsigned long)entries & JUMP_TYPE_MASK);
type = key->type & JUMP_TYPE_MASK; //获取type的最低二位信息
key->entries = entries; //写入jump_entry地址信息
key->type |= type; //还原type的最低二位信息,由上可知jump_entry地址的低二位也是0,即四字节对齐
}
更新指令
以enable函数为例:
void static_key_enable_cpuslocked(struct static_key *key)
{
STATIC_KEY_CHECK_USE(key);
lockdep_assert_cpus_held();
if (atomic_read(&key->enabled) > 0) {
WARN_ON_ONCE(atomic_read(&key->enabled) != 1);
return;
}
jump_label_lock();
if (atomic_read(&key->enabled) == 0) {//判断当前enabled计数,是否没有被多次disable
atomic_set(&key->enabled, -1); //多线程更新情况下的保护性处理
jump_label_update(key); // 更新指令
/*
* See static_key_slow_inc().
*/
atomic_set_release(&key->enabled, 1); //更新完成
}
jump_label_unlock();
}
EXPORT_SYMBOL_GPL(static_key_enable_cpuslocked);
以Disable函数为例:
void static_key_disable_cpuslocked(struct static_key *key)
{
STATIC_KEY_CHECK_USE(key);
lockdep_assert_cpus_held();
if (atomic_read(&key->enabled) != 1) { //过滤不等于与1的情况
WARN_ON_ONCE(atomic_read(&key->enabled) != 0);
return;
}
jump_label_lock();
if (atomic_cmpxchg(&key->enabled, 1, 0)) //若为1 则交换为0并更新指令。disable感觉更简洁
jump_label_update(key);
jump_label_unlock();
}
EXPORT_SYMBOL_GPL(static_key_disable_cpuslocked);
static void jump_label_update(struct static_key *key)
{
struct jump_entry *stop = __stop___jump_table;
bool init = system_state < SYSTEM_RUNNING;
struct jump_entry *entry;
#ifdef CONFIG_MODULES //支持modules的情况下对动态加载的module的特殊处理
struct module *mod;
if (static_key_linked(key)) {
__jump_label_mod_update(key);
return;
}
preempt_disable();
mod = __module_address((unsigned long)key);
if (mod) {
stop = mod->jump_entries + mod->num_jump_entries;
init = mod->state == MODULE_STATE_COMING;
}
preempt_enable();
#endif
entry = static_key_entries(key);
/* if there are no users, entry can be NULL */
if (entry)
__jump_label_update(key, entry, stop, init); //更新指令
}
static void __jump_label_update(struct static_key *key,
struct jump_entry *entry,
struct jump_entry *stop,
bool init)
{
for (; (entry < stop) && (jump_entry_key(entry) == key); entry++) { //寻找该static_key对应的所有jump_entry对象
if (jump_label_can_update(entry, init)) //验证是否可以更新
arch_jump_label_transform(entry, jump_label_type(entry)); //分析当前需要的指令类型,并调用架构相关函数进行更新指令
}
}
static enum jump_label_type jump_label_type(struct jump_entry *entry)
{
struct static_key *key = jump_entry_key(entry);
bool enabled = static_key_enabled(key); //获取当前static_key是否使能
bool branch = jump_entry_is_branch(entry); //获取初始的Branch信息
/* See the comment in linux/jump_label.h */
return enabled ^ branch; //生成指令类型,与 实现逻辑中的dynamic: instruction = enabled ^ branch 匹配
}
#define static_key_enabled(x) \
({ \
if (!__builtin_types_compatible_p(typeof(*x), struct static_key) && \
!__builtin_types_compatible_p(typeof(*x), struct static_key_true) &&\
!__builtin_types_compatible_p(typeof(*x), struct static_key_false)) \
____wrong_branch_error(); \
static_key_count((struct static_key *)x) > 0; \
})
int static_key_count(struct static_key *key)
{
/*
* -1 means the first static_key_slow_inc() is in progress.
* static_key_enabled() must return true, so return 1 here.
*/
int n = atomic_read(&key->enabled);
//对-1做了特殊处理
return n >= 0 ? n : 1;
}
以arm64为例:
void arch_jump_label_transform(struct jump_entry *entry,
enum jump_label_type type)
{
void *addr = (void *)jump_entry_code(entry); //获取需要修改指令的位置
u32 insn;
if (type == JUMP_LABEL_JMP) { //依据指令类型生成指令
insn = aarch64_insn_gen_branch_imm(jump_entry_code(entry),
jump_entry_target(entry),
AARCH64_INSN_BRANCH_NOLINK);
} else {
insn = aarch64_insn_gen_nop();
}
aarch64_insn_patch_text_nosync(addr, insn); //更新指令
}