目录
1 同步(强制)回收处理函数 neigh_forced_gc()
2 异步(周期)回收处理函数 neigh_periodic_work()
前言
对于通用邻居层,我认为主要可以分为三个方面:
- 邻居项处理函数,包括邻居项创建、更新、删除等
- 邻居项的状态机机制,主要是处理邻居项中状态的改变,其中包括几个邻居状态的定时器机制,也包括发送solicit请求等
- 邻居项的垃圾回收机制,主要是负责回收一个邻居表里长时间不用的邻居项,已节省邻居缓存空间。这三个方面需要相互协调工作,才能完成通用邻居层的功能。
异步垃圾回收就是周期的进行垃圾回收,这个是周期性的。而同步垃圾回收机制主要是在创建邻居项而又没有缓存空间时,会调用同步垃圾回收,强制回收无效的邻居缓存。
1 同步(强制)回收处理函数 neigh_forced_gc()
这个函数的逻辑流程还是比较简单的。该函数的功能是遍历邻接表中hash散列数组中的每一个hash表项中的所有邻居表项节点,对于邻居表项的引用计数为1且状态不是NUD_PERMANENT时,则设置neigh->dead为1,并调用neigh_cleanup_and_release释放该邻居项占用的缓存。
static int neigh_forced_gc(struct neigh_table *tbl)
{
int shrunk = 0;
int i;
struct neigh_hash_table *nht;
NEIGH_CACHE_STAT_INC(tbl, forced_gc_runs);
write_lock_bh(&tbl->lock);
nht = rcu_dereference_protected(tbl->nht,
lockdep_is_held(&tbl->lock));
for (i = 0; i < (1 << nht->hash_shift); i++) {
struct neighbour *n;
struct neighbour __rcu **np;
np = &nht->hash_buckets[i];
while ((n = rcu_dereference_protected(*np,
lockdep_is_held(&tbl->lock))) != NULL) {
/* Neighbour record may be discarded if:
* - nobody refers to it.
* - it is not permanent
*/
write_lock(&n->lock);
if (atomic_read(&n->refcnt) == 1 &&
!(n->nud_state & NUD_PERMANENT)) {
rcu_assign_pointer(*np,
rcu_dereference_protected(n->next,
lockdep_is_held(&tbl->lock)));
n->dead = 1;
shrunk = 1;
write_unlock(&n->lock);
neigh_cleanup_and_release(n);
continue;
}
write_unlock(&n->lock);
np = &n->next;
}
}
tbl->last_flush = jiffies;
write_unlock_bh(&tbl->lock);
return shrunk;
}
2 异步(周期)回收处理函数 neigh_periodic_work()
对于异步垃圾回收,在2.6.34里,对于异步垃圾回收的函数实现机制上,进行了调整,在2.6.21里,是直接使用定时器来实现异步清理的,而在2.6.34里,则是使用带有定时器功能的工作队列来实现邻居项的内存异步清理的。在2.6.1里,对于异步清理函数,每次定时器到期后,即会扫描邻居表项中的邻居项hash数组中的一个hash表中的所有邻居表项,对于符合删除条件的邻居项,则会调用函数neigh_release,释放该邻居项。符合删除的条件为:
neigh->dead == 1
neigh->state == NUD_FAILED或者闲置时间超过了指定上限gc_staletime
而从版本2.6.34开始,异步清理函数neigh_periodic_work每次清理时,会清理邻居表里的邻居hash数组里所有hash表里的所有邻居项,而不是仅搜索一个hash表里的所有邻居项。
下面介绍同步清理的实际处理函数neigh_periodic_work(),linux内核版本3.10。
1、每隔300s,重置一次 reachable_time 的值,并更新邻居表的 last_rand,以便于下次执行更新 reach_time 操作(而reachable_time用于处于 reach 状态的邻居项的超时时间,这个就是用于NUD_REACHABLE状态下的定时器超时时间,是邻居项状态转换中的定时器处理)
2、遍历邻居表里的邻居hash数组里的每一个hash表里的所有邻居项,执行以下操作:
a)对于邻居项状态为NUD_PERMANENT或者NUD_IN_TIMER,则遍历下一个邻居项
b)当邻居项的引用计数为1且状态为NUD_FAILED时,则调用neigh_cleanup_and_release释放该邻居项占用的缓存
c)当邻居项的引用计数为1且闲置时间超过gc_staletime时,调用neigh_cleanup_and_release释放该邻居项所占用的缓存
3、调用 schedule_delayed_work(),重启定时器,待定时器超时后则调用 queue_work_on(),调用neigh_periodic_work进行新一轮的异步垃圾回收。
static void neigh_periodic_work(struct work_struct *work)
{
struct neigh_table *tbl = container_of(work, struct neigh_table, gc_work.work);
struct neighbour *n;
struct neighbour __rcu **np;
unsigned int i;
struct neigh_hash_table *nht;
NEIGH_CACHE_STAT_INC(tbl, periodic_gc_runs);
write_lock_bh(&tbl->lock);
nht = rcu_dereference_protected(tbl->nht,
lockdep_is_held(&tbl->lock));
if (atomic_read(&tbl->entries) < tbl->gc_thresh1)
goto out;
/*
* periodically recompute ReachableTime from random function
*/
if (time_after(jiffies, tbl->last_rand + 300 * HZ)) {
struct neigh_parms *p;
tbl->last_rand = jiffies;
for (p = &tbl->parms; p; p = p->next)
p->reachable_time =
neigh_rand_reach_time(p->base_reachable_time);
}
for (i = 0 ; i < (1 << nht->hash_shift); i++) {
np = &nht->hash_buckets[i];
while ((n = rcu_dereference_protected(*np,
lockdep_is_held(&tbl->lock))) != NULL) {
unsigned int state;
write_lock(&n->lock);
state = n->nud_state;
if (state & (NUD_PERMANENT | NUD_IN_TIMER)) {
write_unlock(&n->lock);
goto next_elt;
}
if (time_before(n->used, n->confirmed))
n->used = n->confirmed;
if (atomic_read(&n->refcnt) == 1 &&
(state == NUD_FAILED ||
time_after(jiffies, n->used + n->parms->gc_staletime))) {
*np = n->next;
n->dead = 1;
write_unlock(&n->lock);
neigh_cleanup_and_release(n);
continue;
}
write_unlock(&n->lock);
next_elt:
np = &n->next;
}
/*
* It's fine to release lock here, even if hash table
* grows while we are preempted.
*/
write_unlock_bh(&tbl->lock);
cond_resched();
write_lock_bh(&tbl->lock);
nht = rcu_dereference_protected(tbl->nht,
lockdep_is_held(&tbl->lock));
}
out:
/* Cycle through all hash buckets every base_reachable_time/2 ticks.
* ARP entry timeouts range from 1/2 base_reachable_time to 3/2
* base_reachable_time.
*/
schedule_delayed_work(&tbl->gc_work,
tbl->parms.base_reachable_time >> 1);
write_unlock_bh(&tbl->lock);
}