刚开始看这段代码有点费脑,为啥要在这个地方调用zram_free_page?基于6.1内核分析。
更上层swap_writepage写一个anon页时,不是“应该”分配一个新的swap slot嘛,对应这里的index,那进入此函数时,为啥还要调用zram_free_page,也就是说之前这个index对应的资源为啥没释放,需要在这里释放?
static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
u32 index, struct bio *bio)
{
...
out:
/*
* Free memory associated with this sector
* before overwriting unused sectors.
*/
zram_slot_lock(zram, index);
zram_free_page(zram, index);
从代码的注释,看不到直接的答案,网上查找"zram_free_page",也没什么发现。为了找到原因,开启了一段探索的旅程,真相(代码)以及背后的设计理念,相当有趣......
这里先说结论,有两种情况:
- free_swap_slot的时候trylock失败,导致没有释放
- 内核在为了性能考虑,page的最后被完全swapin之后,并不是100%会释放swap slot和从swap cache中删除,这个主要为了性能考虑。这种情况下相当于该page内存的内存必须先释放掉。
- free_swap_slot释放时,由于zram_slot_trylock不成功,所以没有调用zram_free_page:
free_swap_slot
swap_range_free
si->bdev->bd_disk->fops->swap_slot_free_notify
调用swap device的函数zram_slot_free_notify
static void zram_slot_free_notify(struct block_device *bdev,
unsigned long index)
{
struct zram *zram;
zram = bdev->bd_disk->private_data;
atomic64_inc(&zram->stats.notify_free);
if (!zram_slot_trylock(zram, index)) { <-------
atomic64_inc(&zram->stats.miss_free);
return;
}
zram_free_page(zram, index);
zram_slot_unlock(zram, index);
}
2. 一个anon页 -> swapout -> swapin后,还待在swap_cache里,等下次swapout时,继续使用以前的slot,对应zram的index(这相当于一个优化,等下次swapout时,没必要调用add_to_swap了)
为什么还能待在swap_cache里? 看代码
vm_fault_t do_swap_page(struct vm_fault *vmf)
...
/*
* Remove the swap entry and conditionally try to free up the swapcache.
* We're already holding a reference on the page but haven't mapped it
* yet.
*/
swap_free(entry);
if (should_try_to_free_swap(folio, vma, vmf->flags)) <-------
folio_free_swap(folio);
有条件释放swapcache,看下面代码, 条件之一是mem_cgroup_swap_full(),也就是判定swap是否已经使用了50%,如果超过了50%,就要调用folio_free_swap进一步释放; 还有其他的条件,如VM_LOCKED或mlocked 等
static inline bool should_try_to_free_swap(struct folio *folio,
struct vm_area_struct *vma,
unsigned int fault_flags)
{
if (!folio_test_swapcache(folio))
return false;
if (mem_cgroup_swap_full(folio) || (vma->vm_flags & VM_LOCKED) ||
folio_test_mlocked(folio))
return true;
/*
* If we want to map a page that's in the swapcache writable, we
* have to detect via the refcount if we're really the exclusive
* user. Try freeing the swapcache to get rid of the swapcache
* reference only in case it's likely that we'll be the exlusive user.
*/
return (fault_flags & FAULT_FLAG_WRITE) && !folio_test_ksm(folio) &&
folio_ref_count(folio) == 2;
}
也就是说swap_writepage不一定找一个空闲的slot,也可以复用上一次的slot(这个anon页待在swap_cache),此时swap device像zram,先前压缩的内容,就要通过zram_free_page(zs_free)释放掉;
或者swap_writepage找一个空闲的slot,但是之前这个slot调用zram_slot_free_notify时,由于zram_slot_trylock失败,也没有在zram_slot_free_notify中调用zram_free_page,所以在本次写zram时,释放。
参考文章: