Day46-回顾页面分配-慢速路径

最新推荐文章于 2024-08-20 10:58:52 发布

qiubinwei

最新推荐文章于 2024-08-20 10:58:52 发布

阅读量99

点赞数

文章标签： caffe 人工智能深度学习

本文链接：https://blog.csdn.net/qiubinwei/article/details/132781215

版权

【欢迎关注微信公众号：qiubinwei-1986】

在一边学习一边总结学习的过程中，自身也在不断调整自己的学习方式，状态，工具，期待能够以一种最佳的学习状态进入来提升效率和能力。不过感觉到目前为止还没找到。

为了能够更清晰的理解各个函数之间的调用关系，特地搞了一个processon的在线工具，可以实现脑图和流程图的结合，感觉还是有点效果。比单纯的贴代码要好多了，至少增加了一个自己思考的过程。

差不多三天左右的时间，前前后后对慢速路径的规划学了有3个多小时的累计时间，大致内容如下：

慢速路径概览

慢速路径分配流程分析

当系统内存段确实，处于低水位之下时，alloc_page()函数会进入慢速路径（高于低水位线时，会进入快速路径，之前学习过）

大致流程如下：

主要函数概况

__alloc_pages_nodemask()

在页面分配函数中__alloc_pages_nodemask()中，通过get_page_from_freelist()判断系统内存小于低水位线，进入slowpath

if (unlikely(ac.nodemask != nodemask))        ac.nodemask = nodemask;    //当get_page_from_freelist()分配不成功，就会进入slowpath， 慢速分配    page = __alloc_pages_slowpath(alloc_mask, order, &ac);

__alloc_pages_slowpath()

gfp_mask表示调用页面分配器时传递的分配掩码

order表示需要分配页面的大小

ac表示页面分配器内部使用的控制参数的数据结构

__alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,                        struct alloc_context *ac){    //can_direct_reclaim表示是否允许调用直接页面回收机制    //隐含了标志位__GFP_DIRECT_RECLAIM的都会使用直接页面回收机制    //如GFP_KERNEL、GFP_HIGHUSER_MOVABLE都是包含了__GFP_DIRECT_RECLAIM标志位    bool can_direct_reclaim = gfp_mask & __GFP_DIRECT_RECLAIM;    //costly_order定义内存分配压力，order(阶数)越大，代表需要分配的连续物理内存越大    //对应的带给页面分配器的压力也会越大    const bool costly_order = order > PAGE_ALLOC_COSTLY_ORDER;    struct page *page = NULL;    unsigned int alloc_flags;    unsigned long did_some_progress;    enum compact_priority compact_priority;    enum compact_result compact_result;    int compaction_retries;    int no_progress_loops;    unsigned int cpuset_mems_cookie;    int reserve_flags;    //检查在非中断上下文中是否有__GFP_ATOMIC情况，如有会warning一次    //__GFP_ATOMIC表示调用页面分配器的进程不能直接回收页面或者等待    //__GFP_ATOMIC是优先级较高的分配行为，允许直接访问部分的系统预留内存    if (WARN_ON_ONCE((gfp_mask & (__GFP_ATOMIC|__GFP_DIRECT_RECLAIM)) ==                (__GFP_ATOMIC|__GFP_DIRECT_RECLAIM)))        gfp_mask &= ~__GFP_ATOMIC;retry_cpuset:    compaction_retries = 0;    no_progress_loops = 0;    compact_priority = DEF_COMPACT_PRIORITY;    cpuset_mems_cookie = read_mems_allowed_begin();    //重新配置alloc_flags分配掩码，使用最低水位线（ALLOC_WMARK_MIN)和ALLOC_CPUSET    //判断gfp_mask是否设置了__GFP_HIGH    alloc_flags = gfp_to_alloc_flags(gfp_mask);    //重新计算首选推荐的zone    ac->preferred_zoneref = first_zones_zonelist(ac->zonelist,                    ac->high_zoneidx, ac->nodemask);    if (!ac->preferred_zoneref->zone)        goto nopage;    //调用wake_all_kswapds唤醒kswapd内核线程    if (alloc_flags & ALLOC_KSWAPD)        wake_all_kswapds(order, gfp_mask, ac);    //在上边alloc_flags = gfp_to_alloc_flags(gfp_mask)重新调整了GFP mask    //重新通过get_page_from_freelist以最低水位线进行判断，尝试是否可以进行内存分配    page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);    if (page)        goto got_pg;    /*     * For costly allocations, try direct compaction first, as it's likely     * that we have enough base pages and don't need to reclaim. For non-     * movable high-order allocations, do that as well, as compaction will     * try prevent permanent fragmentation by migrating from blocks of the     * same migratetype.     * Don't try this for allocations that are allowed to ignore     * watermarks, as the ALLOC_NO_WATERMARKS attempt didn't yet happen.     */    //在最低警戒水位条件下无法分配成功下，    //当同时满足如下三种条件时，使用内存规整机制，执行__alloc_pages_direct_compact()    //1.允许调用直接页面回收机制    //2.高成本的分配需求costly_order(没有连续物理空间页面，或者分配不可迁移的多个连续页面)    //3.不能访问系统预留内存。通过调用gfp_pfmemalloc_allowed实现，返回0表示不允许访问系统预留内存    if (can_direct_reclaim &&            (costly_order ||               (order > 0 && ac->migratetype != MIGRATE_MOVABLE))            && !gfp_pfmemalloc_allowed(gfp_mask)) {        //__alloc_pages_direct_compact()模式为COMPACT_PRIO_ASYNC,实现异步模式        page = __alloc_pages_direct_compact(gfp_mask, order,                        alloc_flags, ac,                        INIT_COMPACT_PRIORITY,                        &compact_result);        if (page)            goto got_pg;        /*         * Checks for costly allocations with __GFP_NORETRY, which         * includes some THP page fault allocations         */        if (costly_order && (gfp_mask & __GFP_NORETRY)) {            /*             * If allocating entire pageblock(s) and compaction             * failed because all zones are below low watermarks             * or is prohibited because it recently failed at this             * order, fail immediately unless the allocator has             * requested compaction and reclaim retry.             *             * Reclaim is             *  - potentially very expensive because zones are far             *    below their low watermarks or this is part of very             *    bursty high order allocations,             *  - not guaranteed to help because isolate_freepages()             *    may not iterate over freed pages as part of its             *    linear scan, and             *  - unlikely to make entire pageblocks free on its             *    own.             */            if (compact_result == COMPACT_SKIPPED ||                compact_result == COMPACT_DEFERRED)                goto nopage;            /*             * Looks like reclaim/compaction is worth trying, but             * sync compaction could be very expensive, so keep             * using async compaction.             */            compact_priority = INIT_COMPACT_PRIORITY;        }    }retry:    //确保kswapd内核线程不会进入睡眠    if (alloc_flags & ALLOC_KSWAPD)        wake_all_kswapds(order, gfp_mask, ac);    //__gfp_pfmemalloc_flags判断是否允许访问系统预留内存，返回0表示不允许    reserve_flags = __gfp_pfmemalloc_flags(gfp_mask);    if (reserve_flags)        alloc_flags = reserve_flags;    //重新计算首选zone，first_zones_zonelist    if (!(alloc_flags & ALLOC_CPUSET) || reserve_flags) {        ac->nodemask = NULL;        ac->preferred_zoneref = first_zones_zonelist(ac->zonelist,                    ac->high_zoneidx, ac->nodemask);    }    //get_page_from_freelist尝试重新调用，成功则退出    page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);    if (page)        goto got_pg;    //如果不支持直接页面调用，也无法访问内存，则跳转到nopage    if (!can_direct_reclaim)        goto nopage;    //若当前进程的进程描述符设置了PF_MEMALLOC，    //会在__gfp_pfmemalloc_flags()函数中返回ALLOC_NO_WATERMARKS    //表示可以忽略水位条件，直接访问系统全部的预留内存    //如果仍然无法分配内存，则跳转到标签nopage    if (current->flags & PF_MEMALLOC)        goto nopage;    //调用直接页面回收机制。经过一轮回收后尝试进行内存分配    //如果成功，返回page数据结构    page = __alloc_pages_direct_reclaim(gfp_mask, order, alloc_flags, ac,                            &did_some_progress);    if (page)        goto got_pg;    //调用直接内存规整机制。经过一轮规整后尝试进行内存分配    //如果成功，返回page数据结构    page = __alloc_pages_direct_compact(gfp_mask, order, alloc_flags, ac,                    compact_priority, &compact_result);    if (page)        goto got_pg;    //如果尝试分配掩码有__GFP_NORETRY标志位，    //则直接跳转至标签nopage    if (gfp_mask & __GFP_NORETRY)        goto nopage;    //若要分配大块的物理内存并且分配掩码中没有设置__GFP_RETRY_MAYFAIL,    //说明不允许重试,直接跳转至标签nopage    if (costly_order && !(gfp_mask & __GFP_RETRY_MAYFAIL))        goto nopage;    //通过should_reclaim_retry函数判断是否还需要重试直接页面回收机制，    //若返回非0值，则表示需要重试，跳转至retry标签    if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,                 did_some_progress > 0, &no_progress_loops))        goto retry;    /*     * It doesn't make any sense to retry for the compaction if the order-0     * reclaim is not able to make any progress because the current     * implementation of the compaction depends on the sufficient amount     * of free memory (see __compaction_suitable)     */    //判断是否需要重试内存规整    if (did_some_progress > 0 &&            should_compact_retry(ac, order, alloc_flags,                compact_result, &compact_priority,                &compaction_retries))        goto retry;    //判断是否尝试新的cpuset    if (check_retry_cpuset(cpuset_mems_cookie, ac))        goto retry_cpuset;    //所有的cpuset都重新尝试过后，若还是没法分配出所需要的内存，那么将使用oom killer机制    //__alloc_pages_may_oom函数将调用OOM Killer机制杀掉内存释放空间    page = __alloc_pages_may_oom(gfp_mask, order, ac, &did_some_progress);    if (page)        goto got_pg;    //如果被终止的进程是当前进程且alloc_flags为ALLOC_OOM或者gfp_mask为__GFP_NOMEMALLOC    //跳转到nopage标签    if (tsk_is_oom_victim(current) &&        (alloc_flags == ALLOC_OOM ||         (gfp_mask & __GFP_NOMEMALLOC)))        goto nopage;    //在oom killer之后，did_some_progress表示终止进程后释放的内存    //跳转到retry标签    if (did_some_progress) {        no_progress_loops = 0;        goto retry;    }nopage:    /* Deal with possible cpuset update races before we fail */    if (check_retry_cpuset(cpuset_mems_cookie, ac))        goto retry_cpuset;    //如果分配掩码为__GFP_NOFAIL，表示分配不能失败        if (gfp_mask & __GFP_NOFAIL) {        /*         * All existing users of the __GFP_NOFAIL are blockable, so warn         * of any new users that actually require GFP_NOWAIT         */        if (WARN_ON_ONCE(!can_direct_reclaim))            goto fail;        /*         * PF_MEMALLOC request from this context is rather bizarre         * because we cannot reclaim anything and only can loop waiting         * for somebody to do a work for us         */        WARN_ON_ONCE(current->flags & PF_MEMALLOC);        /*         * non failing costly orders are a hard requirement which we         * are not prepared for much so let's warn about these users         * so that we can identify them and convert them to something         * else.         */        WARN_ON_ONCE(order > PAGE_ALLOC_COSTLY_ORDER);        /*         * Help non-failing allocations by giving them access to memory         * reserves but do not use ALLOC_NO_WATERMARKS because this         * could deplete whole memory reserves which would just make         * the situation worse         */        //尝试通过__alloc_pages_cpuset_fallback函数再次分配内存，并使用ALLOC_HARDER分配标志位        //该分配标志位会对__zone_watermark_ok（）判断水位情况有影响            page = __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_HARDER, ac);        //如果有内存，调用got_pg标签        if (page)            goto got_pg;        //如仍然没有办法，则进入retry标签        cond_resched();        goto retry;    }fail:    //调用warn_alloc，抛出页面分配失败信息    warn_alloc(gfp_mask, ac->nodemask,            "page allocation failure: order:%u", order);got_pg:    return page;}

gfp_to_alloc_flags()

gfp_to_alloc_flags(gfp_t gfp_mask){    unsigned int alloc_flags = ALLOC_WMARK_MIN | ALLOC_CPUSET;    /* __GFP_HIGH is assumed to be the same as ALLOC_HIGH to save a branch. */    BUILD_BUG_ON(__GFP_HIGH != (__force gfp_t) ALLOC_HIGH);    /*     * The caller may dip into page reserves a bit more if the caller     * cannot run direct reclaim, or if the caller has realtime scheduling     * policy or is asking for __GFP_HIGH memory.  GFP_ATOMIC requests will     * set both ALLOC_HARDER (__GFP_ATOMIC) and ALLOC_HIGH (__GFP_HIGH).     */    alloc_flags |= (__force int) (gfp_mask & __GFP_HIGH);    if (gfp_mask & __GFP_ATOMIC) {        /*         * Not worth trying to allocate harder for __GFP_NOMEMALLOC even         * if it can't schedule.         */        if (!(gfp_mask & __GFP_NOMEMALLOC))            alloc_flags |= ALLOC_HARDER;        /*         * Ignore cpuset mems for GFP_ATOMIC rather than fail, see the         * comment for __cpuset_node_allowed().         */        alloc_flags &= ~ALLOC_CPUSET;    } else if (unlikely(rt_task(current)) && !in_interrupt())        alloc_flags |= ALLOC_HARDER;    if (gfp_mask & __GFP_KSWAPD_RECLAIM)        alloc_flags |= ALLOC_KSWAPD;#ifdef CONFIG_CMA    if (gfpflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE)        alloc_flags |= ALLOC_CMA;#endif    return alloc_flags;}

wake_all_kswapds()

static void wake_all_kswapds(unsigned int order, gfp_t gfp_mask,                 const struct alloc_context *ac){    struct zoneref *z;    struct zone *zone;    pg_data_t *last_pgdat = NULL;    enum zone_type high_zoneidx = ac->high_zoneidx;    for_each_zone_zonelist_nodemask(zone, z, ac->zonelist, high_zoneidx,                    ac->nodemask) {        if (last_pgdat != zone->zone_pgdat)            wakeup_kswapd(zone, gfp_mask, order, high_zoneidx);        last_pgdat = zone->zone_pgdat;    }}

__alloc_pages_direct_compact

__alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,        unsigned int alloc_flags, const struct alloc_context *ac,        enum compact_priority prio, enum compact_result *compact_result){    *compact_result = COMPACT_SKIPPED;    return NULL;}

should_reclaim_retry()

should_reclaim_retry(gfp_t gfp_mask, unsigned order,             struct alloc_context *ac, int alloc_flags,             bool did_some_progress, int *xx){    struct zone *zone;    struct zoneref *z;    bool ret = false;    /*     * Costly allocations might have made a progress but this doesn't mean     * their order will become available due to high fragmentation so     * always increment the no progress counter for them     */    if (did_some_progress && order <= PAGE_ALLOC_COSTLY_ORDER)        *no_progress_loops = 0;    else        (*no_progress_loops)++;    /*     * Make sure we converge to OOM if we cannot make any progress     * several times in the row.     */    if (*no_progress_loops > MAX_RECLAIM_RETRIES) {        /* Before OOM, exhaust highatomic_reserve */        return unreserve_highatomic_pageblock(ac, true);    }    /*     * Keep reclaiming pages while there is a chance this will lead     * somewhere.  If none of the target zones can satisfy our allocation     * request even if all reclaimable pages are considered then we are     * screwed and have to go OOM.     */    for_each_zone_zonelist_nodemask(zone, z, ac->zonelist, ac->high_zoneidx,                    ac->nodemask) {        unsigned long available;        unsigned long reclaimable;        unsigned long min_wmark = min_wmark_pages(zone);        bool wmark;        available = reclaimable = zone_reclaimable_pages(zone);        available += zone_page_state_snapshot(zone, NR_FREE_PAGES);        /*         * Would the allocation succeed if we reclaimed all         * reclaimable pages?         */        wmark = __zone_watermark_ok(zone, order, min_wmark,                ac_classzone_idx(ac), alloc_flags, available);        trace_reclaim_retry_zone(z, order, reclaimable,                available, min_wmark, *no_progress_loops, wmark);        if (wmark) {            /*             * If we didn't make any progress and have a lot of             * dirty + writeback pages then we should wait for             * an IO to complete to slow down the reclaim and             * prevent from pre mature OOM             */            if (!did_some_progress) {                unsigned long write_pending;                write_pending = zone_page_state_snapshot(zone,                            NR_ZONE_WRITE_PENDING);                if (2 * write_pending > reclaimable) {                    congestion_wait(BLK_RW_ASYNC, HZ/10);                    return true;                }            }            ret = true;            goto out;        }    }out:    /*     * Memory allocation/reclaim might be called from a WQ context and the     * current implementation of the WQ concurrency control doesn't     * recognize that a particular WQ is congested if the worker thread is     * looping without ever sleeping. Therefore we have to do a short sleep     * here rather than calling cond_resched().     */    if (current->flags & PF_WQ_WORKER)        schedule_timeout_uninterruptible(1);    else        cond_resched();    return ret;}

warn_alloc()

void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...){    struct va_format vaf;    va_list args;    static DEFINE_RATELIMIT_STATE(nopage_rs, 10*HZ, 1);    if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs))        return;    va_start(args, fmt);    vaf.fmt = fmt;    vaf.va = &args;    pr_warn("%s: %pV, mode:%#x(%pGg), nodemask=%*pbl",            current->comm, &vaf, gfp_mask, &gfp_mask,            nodemask_pr_args(nodemask));    va_end(args);    cpuset_print_current_mems_allowed();    pr_cont("\n");    dump_stack();    warn_alloc_show_mem(gfp_mask, nodemask);}

【欢迎关注微信公众号：qiubinwei-1986】

qiubinwei

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Day46-回顾页面分配-慢速路径

【欢迎关注微信公众号：qiubinwei-1986】在一边学习一边总结学习的过程中，自身也在不断调整自己的学习方式，状态，工具，期待能够以一种最佳的学习状态进入来提升效率和能力。不过感觉到目前为止还没找到。为了能够更清晰的理解各个函数之间的调用关系，特地搞了一个processon的在线工具，可以实现脑图和流程图的结合，感觉还是有点效果。比单纯的贴代码要好多了，至少增加了一个自己思考的过程。
复制链接

扫一扫