oom killer &lmkd killer

目录

oom killer&reaper

task 进程内存回收

杀进程内存回收

lmkd killer

psi 

vmpressure

事件通知

内核psi 实现

内核vmpressure


oom killer&reaper

kernel-4.19/mm/page_alloc.c

void show_free_areas()

5558  static void show_migration_types(unsigned char type)
5559  {
5560      static const char types[MIGRATE_TYPES] = {
5561          [MIGRATE_UNMOVABLE]    = 'U',
5562          [MIGRATE_MOVABLE]    = 'M',
5563          [MIGRATE_RECLAIMABLE]    = 'E',
5564          [MIGRATE_HIGHATOMIC]    = 'H',
5565  #ifdef CONFIG_CMA
5566          [MIGRATE_CMA]        = 'C',
5567  #endif
5568  #ifdef CONFIG_MEMORY_ISOLATION
5569          [MIGRATE_ISOLATE]    = 'I',
5570  #endif
5571      };
5572      char tmp[MIGRATE_TYPES + 1];
5573      char *p = tmp;
5574      int i;
5575  
5576      for (i = 0; i < MIGRATE_TYPES; i++) {
5577          if (type & (1 << i))
5578              *p++ = types[i];
5579      }
5580  
5581      *p = '\0';
5582      printk(KERN_CONT "(%s) ", tmp);
5583  }
5584  

07-26 07:17:13.012294   665   665 I tombstoned: received crash request for pid 1666

07-26 07:17:13.069550  1666  1678 I system_server: Wrote stack traces to tombstoned
07-26 07:17:13.069883   665   665 E tombstoned: Traces for pid 1666 written to: trace_14

free 内存:[52244.829414] Normal free:176476kB min:8084kB low:41816kB high:57100kB 

低于low 水位

<4>[51754.901217] Normal free:37948kB min:8084kB low:41816kB high:57100kB active_anon:1186056kB inactive_anon:902712kB active_file:478228kB inactive_file:771480kB unevictable:160100kB writepending:180kB present:5242872kB managed:5095344kB mlocked:160100kB kernel_stack:87272kB pagetables:112028kB bounce:0kB free_pcp:7580kB local_pcp:1000kB free_cma:0kB
<4>[52244.829414] Normal free:176476kB min:8084kB low:41816kB high:57100kB active_anon:1017432kB inactive_anon:933272kB active_file:351592kB inactive_file:674244kB unevictable:160108kB writepending:4572kB present:5242872kB managed:5095344kB mlocked:160108kB kernel_stack:88052kB pagetables:115716kB bounce:0kB free_pcp:4680kB local_pcp:184kB free_cma:4944kB
<4>[52617.818993] Normal free:13104kB min:8084kB low:41816kB high:57100kB active_anon:1249956kB inactive_anon:815476kB active_file:347316kB inactive_file:683292kB unevictable:160108kB writepending:1596kB present:5242872kB managed:5095344kB mlocked:160108kB kernel_stack:91604kB pagetables:123852kB bounce:0kB free_pcp:9056kB local_pcp:1092kB free_cma:696kB
<4>[52716.114577] Normal free:254152kB min:8084kB low:41816kB high:57100kB active_anon:851372kB inactive_anon:776516kB active_file:327688kB inactive_file:558232kB unevictable:160108kB writepending:780kB present:5242872kB managed:5095344kB mlocked:160108kB kernel_stack:78948kB pagetables:146820kB bounce:0kB free_pcp:5552kB local_pcp:536kB free_cma:23092kB
<4>[52813.015225] Normal free:42824kB min:8084kB low:41816kB high:57100kB active_anon:1100268kB inactive_anon:670188kB active_file:592840kB inactive_file:912928kB unevictable:160124kB writepending:848kB present:5242872kB managed:5095344kB mlocked:160124kB kernel_stack:80996kB pagetables:106528kB bounce:0kB free_pcp:6908kB local_pcp:428kB free_cma:496kB

内存信息:

<3>[52892.632369]  (6)[10640:updateBufferCou][ION]warn: alloc pages order: 1 time: 35948307 ns
<4>[52892.632380]  (6)[10640:updateBufferCou]active_anon:388097 inactive_anon:192252 isolated_anon:0
<4>[52892.632380]  active_file:161630 inactive_file:412807 isolated_file:0
<4>[52892.632380]  unevictable:40309 dirty:8659 writeback:114 unstable:0
<4>[52892.632380]  slab_reclaimable:66918 slab_unreclaimable:84971
<4>[52892.632380]  mapped:303466 shmem:24030 pagetables:35513 bounce:0
<4>[52892.632380]  free:104913 free_pcp:3369 free_cma:489
<4>[52892.632387]  (6)[10640:updateBufferCou]Node 0 active_anon:1552388kB inactive_anon:769008kB active_file:646520kB inactive_file:1651228kB unevictable:161236kB isolated(anon):0kB isolated(file):0kB mapped:1213864kB dirty:34636kB writeback:456kB shmem:96120kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
<4>[52892.632392] DMA32 free:347432kB min:4284kB low:22160kB high:30260kB active_anon:448000kB inactive_anon:176596kB active_file:175432kB inactive_file:576756kB unevictable:1112kB writepending:416kB present:2770188kB managed:2719800kB mlocked:1112kB kernel_stack:31020kB pagetables:43660kB bounce:0kB free_pcp:10512kB local_pcp:1332kB free_cma:1264kB
<4>[52892.632395]  (6)[10640:updateBufferCou]lowmem_reserve[]: 0 4975 4975
<4>[52892.632400] Normal free:72220kB min:8084kB low:41816kB high:57100kB active_anon:1104904kB inactive_anon:592640kB active_file:471088kB inactive_file:1074472kB unevictable:160124kB writepending:34480kB present:5242872kB managed:5095344kB mlocked:160124kB kernel_stack:75024kB pagetables:98392kB bounce:0kB free_pcp:2936kB local_pcp:624kB free_cma:692kB
<4>[52892.632401]  (6)[10640:updateBufferCou]lowmem_reserve[]: 0 0 0
<4>[52892.632404] DMA32: 2449*4kB (UMECH) 2355*8kB (UMECH) 207*16kB (UECH) 1*32kB (UECH) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 11884*4kB (UMECH) 4727*8kB (UMECH) 546*16kB (UECH) 7*32kB (UECH) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 16521*4kB (UMECH) 4075*8kB (UMECH) 258*16kB (UECH) 6*32kB (UECH) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 10031*4kB (UMECH) 2727*8kB (UMECH) 1490*16kB (UECH) 1014*32kB (UECH) 0*64kB 1*128kB (H) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 347652kB
<4>[52892.632423] Normal: 2*4kB (UMEH) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 368*4kB (UMEH) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 1887*4kB (UMEH) 2489*8kB (MEH) 2*16kB (MEH) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 3384*4kB (UMEH) 2502*8kB (MEH) 321*16kB (MEH) 132*32kB (MEH) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 71884kB
<4>[52892.632438]  (6)[10640:updateBufferCou]643408 total pagecache pages
<4>[52892.632442]  (6)[10640:updateBufferCou]5042 pages in swap cache
<4>[52892.632444]  (6)[10640:updateBufferCou]Swap cache stats: add 5650558, delete 5645530, find 2491845/4236319
<4>[52892.632446]  (6)[10640:updateBufferCou]Free swap  = 2850816kB
<4>[52892.632448]  (6)[10640:updateBufferCou]Total swap = 4194300kB
<3>[52892.639497]  (5)[10648:camerahalserver][ION] ion_mm_heap_allocate warn: size: 6291456 time: 29212000 ns --12

kernel-4.19/drivers/staging/android/mtk_ion/mtk/ion_mm_heap.c   
static int ion_mm_heap_allocate   

ION &free_cma

<4>[51754.901177]  free:23018 free_pcp:2800 free_cma:0
<4>[51754.901201] DMA32 free:54124kB min:4284kB low:22160kB high:30260kB active_anon:764840kB inactive_anon:337536kB active_file:210404kB inactive_file:450608kB unevictable:1088kB writepending:388kB present:2770188kB managed:2719800kB mlocked:1088kB kernel_stack:42520kB pagetables:63680kB bounce:0kB free_pcp:3616kB local_pcp:480kB free_cma:0kB
<4>[51754.901217] Normal free:37948kB min:8084kB low:41816kB high:57100kB active_anon:1186056kB inactive_anon:902712kB active_file:478228kB inactive_file:771480kB unevictable:160100kB writepending:180kB present:5242872kB managed:5095344kB mlocked:160100kB kernel_stack:87272kB pagetables:112028kB bounce:0kB free_pcp:7580kB local_pcp:1000kB free_cma:0kB
<4>[52244.829368]  free:117621 free_pcp:3984 free_cma:1423
<4>[52244.829398] DMA32 free:294008kB min:4284kB low:22160kB high:30260kB active_anon:462680kB inactive_anon:400480kB active_file:150328kB inactive_file:500308kB unevictable:1096kB writepending:528kB present:2770188kB managed:2719800kB mlocked:1096kB kernel_stack:41244kB pagetables:57396kB bounce:0kB free_pcp:11256kB local_pcp:1424kB free_cma:748kB
<4>[52244.829414] Normal free:176476kB min:8084kB low:41816kB high:57100kB active_anon:1017432kB inactive_anon:933272kB active_file:351592kB inactive_file:674244kB unevictable:160108kB writepending:4572kB present:5242872kB managed:5095344kB mlocked:160108kB kernel_stack:88052kB pagetables:115716kB bounce:0kB free_pcp:4680kB local_pcp:184kB free_cma:4944kB
<4>[52617.818954]  free:14475 free_pcp:4365 free_cma:174

<3>[52104.023224]  (6)[865:HwBinder:820_1][ION] ion_mm_heap_allocate warn: size: 4833280 time: 11988847 ns --10
<3>[52149.067876]  (1)[820:allocator@4.0-s][ION] ion_mm_heap_allocate warn: size: 4644864 time: 15130462 ns --10
<3>[52177.676606]  (4)[865:HwBinder:820_1][ION] ion_mm_heap_allocate warn: size: 4833280 time: 12451308 ns --10
<3>[52244.648844]  (1)[865:HwBinder:820_1][ION] ion_mm_heap_allocate warn: size: 4587520 time: 11143693 ns --10
<3>[52244.853367]  (2)[865:HwBinder:820_1][ION] ion_mm_heap_allocate warn: size: 4640768 time: 48444693 ns --10
<3>[52281.490891]  (2)[1150:HwBinder:820_3][ION] ion_mm_heap_allocate warn: size: 4644864 time: 20040308 ns --10
<3>[52285.704463]  (5)[1150:HwBinder:820_3][ION] ion_mm_heap_allocate warn: size: 4833280 time: 14012308 ns --10
<3>[52342.068943]  (0)[1150:HwBinder:820_3][ION] ion_mm_heap_allocate warn: size: 4657152 time: 10820385 ns --10

oom-killer  触发 oom_reaper 回收内存

<6>[52422.931280]  (6)[85:oom_reaper][wlan][17241]nicGetPendingCmdInfo:(TX INFO) Get command: 000000003d899667, nicCmdEventQueryStatistics.cfi_jt [wlan_drv_gen4m], cmd=0x82, seq=125
<6>[52422.936286]  (6)[85:oom_reaper]oom_reaper: reaped process 32659 (ocess.gservices), now anon-rss:0kB, file-rss:0kB, shmem-rss:476kB
<7>[52422.942561]  (6)[5106:mali-cmar-backe]mtk_dbgtop_dfd_timeout: before MTK_DBGTOP_LATCH_CTL2(0x603e8)
<7>[52422.970486]  (1)[5106:mali-cmar-backe]mtk_dbgtop_dfd_timeout: before MTK_DBGTOP_LATCH_CTL2(0x603e8)
<6>[52422.972047]  (6)[85:oom_reaper]oom_reaper: reaped process 19935 (.android.gms.ui), now anon-rss:0kB, file-rss:8716kB, shmem-rss:804kB
<6>[52422.980762]  (6)[85:oom_reaper]oom_reaper: reaped process 16233 (id.printspooler), now anon-rss:0kB, file-rss:0kB, shmem-rss:456kB
<7>[52422.996885]  (7)[5106:mali-cmar-backe]mtk_dbgtop_dfd_timeout: before MTK_DBGTOP_LATCH_CTL2(0x603e8)
<6>[52422.999074]  (6)[85:oom_reaper]oom_reaper: reaped process 7292 (ndroid.calendar), now anon-rss:0kB, file-rss:7712kB, shmem-rss:2336kB
<6>[52423.019935]  (2)[128:watchdogd][wdtk] kick watchdog
<6>[52423.023571]  (6)[85:oom_reaper]oom_reaper: reaped process 31759 (id.apps.tachyon), now anon-rss:0kB, file-rss:416kB, shmem-rss:748kB
<6>[52423.029840]  (6)[85:oom_reaper]oom_reaper: reaped process 7126 (eng:pushservice), now anon-rss:0kB, file-rss:0kB, shmem-rss:468kB
<7>[52423.030867]  (7)[5106:mali-cmar-backe]mtk_dbgtop_dfd_timeout: before MTK_DBGTOP_LATCH_CTL2(0x603e8)
<6>[52423.045601]  (6)[85:oom_reaper]oom_reaper: reaped process 30465 (wps.moffice_eng), now anon-rss:0kB, file-rss:2880kB, shmem-rss:904kB
<6>[52423.051797]  (6)[85:oom_reaper]oom_reaper: reaped process 6593 (ocessService0:0), now anon-rss:0kB, file-rss:0kB, shmem-rss:528kB
<6>[52423.058919]  (6)[85:oom_reaper]oom_reaper: reaped process 30827 (ice_eng:gcmpush), now anon-rss:0kB, file-rss:0kB, shmem-rss:648kB
<6>[52423.064856]  (6)[85:oom_reaper]oom_reaper: reaped process 7146 (erseabackground), now anon-rss:0kB, file-rss:0kB, shmem-rss:476kB
<7>[52423.066918]  (7)[5106:mali-cmar-backe]mtk_dbgtop_dfd_timeout: before MTK_DBGTOP_LATCH_CTL2(0x603e8)
<6>[52423.070778]  (6)[85:oom_reaper]oom_reaper: reaped process 6658 (:widgetProvider), now anon-rss:0kB, file-rss:0kB, shmem-rss:480kB
<6>[52423.079906]  (6)[85:oom_reaper]oom_reaper: reaped process 6689 (office_eng:scan), now anon-rss:0kB, file-rss:0kB, shmem-rss:468kB
<3>[52423.085944]  (6)[26470:kworker/6:2][sensor_devinfo] sync_utc2scp_work 583 : kernel_ts: 1658790184.875517, 1658790184.875517
<3>[52423.086691] -(1)[27755:binder:20473_8][mtk_nanohub]IPI_SENSOR cannot find cmd!
<7>[52423.097788]  (1)[5106:mali-cmar-backe]mtk_dbgtop_dfd_timeout: before MTK_DBGTOP_LATCH_CTL2(0x603e8)

Low on memory

07-26 07:18:57.380075 16722 17348 I ActivityManager: Low on memory:

07-26 07:18:57.380399 16722 17348 I ActivityManager:   MemInfo:   677,896K slab,     9,172K shmem,   132,720K vm alloc,    39,216K page tables    37,468K kernel stack
07-26 07:18:57.380399 16722 17348 I ActivityManager:                5,924K buffers, 4,145,992K cached,   680,532K mapped, 1,117,072K free
07-26 07:18:57.380399 16722 17348 I ActivityManager:   ZRAM:   223,464K RAM, 4,194,300K swap total, 3,831,784K swap free
07-26 07:18:57.380399 16722 17348 I ActivityManager:   Free RAM: 5,513,732K
07-26 07:18:57.380399 16722 17348 I ActivityManager:        ION:   512,372K
07-26 07:18:57.380399 16722 17348 I ActivityManager:        GPU:         0K
07-26 07:18:57.380399 16722 17348 I ActivityManager:   Used RAM: 2,068,918K
07-26 07:18:57.380399 16722 17348 I ActivityManager:   Lost RAM:   381,511K


1314      @GuardedBy("mService")
1315      final void doLowMemReportIfNeededLocked(ProcessRecord dyingProc) {
1316          // If there are no longer any background processes running,
1317          // and the app that died was not running instrumentation,
1318          // then tell everyone we are now low on memory.
1319          if (!mService.mProcessList.haveBackgroundProcessLOSP()) {
1320              boolean doReport = Build.IS_DEBUGGABLE;
1321              final long now = SystemClock.uptimeMillis();
1322              if (doReport) {
1323                  if (now < (mLastMemUsageReportTime + 5 * 60 * 1000)) {
1324                      doReport = false;
1325                  } else {
1326                      mLastMemUsageReportTime = now;
1327                  }
1328              }

<4>[53359.909526] -(4)[3012:binder:1666_B]Some other process 3012:binder:1666_B want to send sig:9 to pid:16507 tgid:792 comm:FinalizerWatchd
<4>[53359.909996]  (4)[792:main]critical svc 792:main exit with 9 !

native 进程内存泄漏

从 event log 中可以看到进程的内存使用信息。

一般通过 am_pss 中能看个各个进程内存使用情况,某个 native 进程的 am_pss 中 uss 过高,大概率是它引起内存泄漏。

am_pss: Pid, UID, ProcessName, Pss, Uss。

am_meminfo: Cached,Free,Zram,Kernel,Native

am_low_memory: NumProcesses

vss (virtual set size) 虚拟内存,从进程地址空间统计,包括未实际申请到物理内存的部分。

rss (resident set size) 实际物理内存+共享库。共享库如果映射多个进程,不均摊。

pss ( proportional set size) 物理内存+均摊的共享内存。

uss (unique set size) 独占物理内存,不包含共享库部分。

虚拟地址内存泄漏

vss oom,多发生在 32 位应用。虚拟地址空间不足,无法申请到 vma,所以申请内存失败。

一般只有发生泄漏的应用会崩溃,物理内存情况可能使用并不多,虚拟内存可能接近 4G(32位)。

一般需要 smaps/maps 信息做进一步分析,确认哪种类型的 vma 占比较多,那么大概率是它泄漏(比如 libc, egl,ion,等等)。

ion 内存泄漏

ion 是 android 中特有的内存分配器,一般在 camera,图像中使用较多。

通过 ion 的 ioctl 接口,可以申请不同类型的内存,可以为指定进程预留,比如 物理地址空间连续(cma),为相机预留。

ion 申请的内存需要主动释放,不释放会存在泄漏。每一个 heap 中申请的都对应一个 ion_buffer,我们可以统计 ion buffer 中某个进程占比多少(高通支持)。

如果 ion 的总量特别大(比如 4/8G),那么大概率是 ion 泄漏。再通过每个进程的信息,确定到是哪个进程申请导致的泄漏。

slab 内存泄漏

slab 机制是内核申请小块内存的管理机制,同样需要主动 free。

task 进程内存回收

在内核执行分配内存不够时,会执行out_of_memory ,会wake_oom_reaper

创建oom 内核线程

679  static int __init oom_init(void)
680  {
681      oom_reaper_th = kthread_run(oom_reaper, NULL, "oom_reaper");
682      return 0;
683  }

唤醒oom 线程执行 oom_reaper

663  static void wake_oom_reaper(struct task_struct *tsk)
664  {
665      /* mm is already queued? */
666      if (test_and_set_bit(MMF_OOM_REAP_QUEUED, &tsk->signal->oom_mm->flags))
667          return;
668  
669      get_task_struct(tsk);
670  
671      spin_lock(&oom_reaper_lock);
672      tsk->oom_reaper_list = oom_reaper_list;   //保存之前的记录
673      oom_reaper_list = tsk;   //记录要回收内存的task 
674      spin_unlock(&oom_reaper_lock);
675      trace_wake_reaper(tsk->pid);
676      wake_up(&oom_reaper_wait);
677  }

对task 内存进行回收

643  static int oom_reaper(void *unused)
644  {
645      while (true) {
646          struct task_struct *tsk = NULL;
647  
648          wait_event_freezable(oom_reaper_wait, oom_reaper_list != NULL);
649          spin_lock(&oom_reaper_lock);
650          if (oom_reaper_list != NULL) {
651              tsk = oom_reaper_list;  //要回收内存的task 
652              oom_reaper_list = tsk->oom_reaper_list;   //还原之前的记录
653          }
654          spin_unlock(&oom_reaper_lock);
655  
656          if (tsk)
657              oom_reap_task(tsk);
658      }
659  
660      return 0;
661  }

对匿名和非VM_SHARED 页面回收
612  #define MAX_OOM_REAP_RETRIES 10
613  static void oom_reap_task(struct task_struct *tsk)
614  {
615      int attempts = 0;
616      struct mm_struct *mm = tsk->signal->oom_mm;
617  
618      /* Retry the down_read_trylock(mmap_sem) a few times */
619      while (attempts++ < MAX_OOM_REAP_RETRIES && !oom_reap_task_mm(tsk, mm))
620          schedule_timeout_idle(HZ/10);
621  
622      if (attempts <= MAX_OOM_REAP_RETRIES ||
623          test_bit(MMF_OOM_SKIP, &mm->flags))
624          goto done;
625  
626      pr_info("oom_reaper: unable to reap pid:%d (%s)\n",
627          task_pid_nr(tsk), tsk->comm);
628      debug_show_all_locks();
629  
630  done:
631      tsk->oom_reaper_list = NULL;
632  
633      /*
634       * Hide this mm from OOM killer because it has been either reaped or
635       * somebody can't call up_write(mmap_sem).
636       */
637      set_bit(MMF_OOM_SKIP, &mm->flags);
638  
639      /* Drop a reference taken by wake_oom_reaper */
640      put_task_struct(tsk);
641  }

杀进程内存回收

/**
1087   * out_of_memory - kill the "best" process when we run out of memory
1088   * @oc: pointer to struct oom_control
1089   *
1090   * If we run out of memory, we have the choice between either
1091   * killing a random task (bad), letting the system crash (worse)
1092   * OR try to be smart about which process to kill. Note that we
1093   * don't have to be perfect here, we just have to be good.
1094   */
1095  bool out_of_memory(struct oom_control *oc)
1096  {
1097      unsigned long freed = 0;
1098      enum oom_constraint constraint = CONSTRAINT_NONE;
1099  
1100      if (oom_killer_disabled)
1101          return false;
1102  
1103      if (!is_memcg_oom(oc)) {
1104          blocking_notifier_call_chain(&oom_notify_list, 0, &freed);   
1105          if (freed > 0)
1106              /* Got some memory back in the last second. */
1107              return true;
1108      }
1109  
1110      /*
1111       * If current has a pending SIGKILL or is exiting, then automatically
1112       * select it.  The goal is to allow it to allocate so that it may
1113       * quickly exit and free its memory.
1114       */
1115      if (task_will_free_mem(current)) {
1116          mark_oom_victim(current);
1117          wake_oom_reaper(current);
1118          return true;
1119      }
1120  
1121      /*
1122       * The OOM killer does not compensate for IO-less reclaim.
1123       * pagefault_out_of_memory lost its gfp context so we have to
1124       * make sure exclude 0 mask - all other users should have at least
1125       * ___GFP_DIRECT_RECLAIM to get here. But mem_cgroup_oom() has to
1126       * invoke the OOM killer even if it is a GFP_NOFS allocation.
1127       */
1128      if (oc->gfp_mask && !(oc->gfp_mask & __GFP_FS) && !is_memcg_oom(oc))
1129          return true;
1130  
1131      /*
1132       * Check if there were limitations on the allocation (only relevant for
1133       * NUMA and memcg) that may require different handling.
1134       */
1135      constraint = constrained_alloc(oc);
1136      if (constraint != CONSTRAINT_MEMORY_POLICY)
1137          oc->nodemask = NULL;
1138      check_panic_on_oom(oc, constraint);
1139  
1140      if (!is_memcg_oom(oc) && sysctl_oom_kill_allocating_task &&
1141          current->mm && !oom_unkillable_task(current, NULL, oc->nodemask) &&
1142          current->signal->oom_score_adj != OOM_SCORE_ADJ_MIN) {
1143          get_task_struct(current);
1144          oc->chosen = current;
1145          oom_kill_process(oc, "Out of memory (oom_kill_allocating_task)");
1146          return true;
1147      }
1148  
1149      select_bad_process(oc);
1150      /* Found nothing?!?! */
1151      if (!oc->chosen) {
1152          dump_header(oc, NULL);
1153          pr_warn("Out of memory and no killable processes...\n");
1154          /*
1155           * If we got here due to an actual allocation at the
1156           * system level, we cannot survive this and will enter
1157           * an endless loop in the allocator. Bail out now.
1158           */
1159          if (!is_sysrq_oom(oc) && !is_memcg_oom(oc)) {
1160  #ifdef CONFIG_PAGE_OWNER
1161              print_max_page_owner();
1162  #endif
1163              panic("System is deadlocked on memory\n");
1164          }
1165      }
1166      if (oc->chosen && oc->chosen != (void *)-1UL)
1167          oom_kill_process(oc, !is_memcg_oom(oc) ? "Out of memory" :
1168                   "Memory cgroup out of memory");
1169      return !!oc->chosen;
1170  }

  1. 首先通知 oom_notify_list 链表的订阅者:依据通知链(notification chains)机制,通知注册了 oom_notify_list 的模块释放内存;如果订阅者能够处理 OOM,释放了内存则会退出 OOM killer,不执行后续操作。

  2. 如果当前 task 存在 pending 的 SIGKILL,或者已经退出的时,会释放当前进程的资源。包括和 task 共享同一个内存描述符 mm_struct 的进程、线程也会被杀掉。

  3. 对于 IO-less 的回收,依据 gfp_mask 判断,如果 1) 分配的是非 FS 操作类型的分配,并且 2)不是 cgroup 的内存 OOM -> 直接退出 oom-killer。

  4. 检查内存分配的约束(例如 NUMA),有 CONSTRAINT_NONE, CONSTRAINT_CPUSET,CONSTRAINT_MEMORY_POLICY, CONSTRAINT_MEMCG 类型。

  5. 检查 /proc/sys/vm/panic_on_oom 的设置,做操作;为0 就不会直接panic;为1 可能 panic,也可能尝试 oom_killer。如果 panic_on_oom 设置的为 2,则进程直接 panic 强制退出
  6. /proc/sys/vm/oom_kill_allocating_task 为 true 的时候,调用 oom_kill_process 直接 kill 掉当前想要分配内存的进程 (此进程能够被 kill 时)。

  7. select_bad_process(),选择最合适的进程,调用 oom_kill_process。

  8. 如果没有合适的进程,如果非 sysrq 和 memcg,则 panic 强制退出。

通过oom_kill_process 可能会调用wake_oom_reaper 回收内存。

Android OOM、OOMKillery以及LMK相关概念_尹杰Enjoy your code的博客-CSDN博客

Kubernetes 单机侧的驱逐策略总结_米开朗基杨的博客-CSDN博客

lmkd killer

lmk & shrinker

这个不同的android 版本,实现有些差异

Linux内核机制总结内存管理之页回收(二十三)_张衡天的博客-CSDN博客_linux内存回收命令

lowmemorykiller驱动 - ruby.dongyu - 博客园

binder.c

__init binder_init(void){

    binder_alloc_shrinker_init();

    init_binder_device(device_name);

    init_binderfs();

}

一个例子就只binder 会注册shrinker ,低内存会通过shrinke_slab 进行回收bindr_alloc 分配的页面。


ProcessList.java
    // Low Memory Killer Daemon command codes.
346      // These must be kept in sync with lmk_cmd definitions in lmkd.h
347      //
348      // LMK_TARGET <minfree> <minkillprio> ... (up to 6 pairs)
349      // LMK_PROCPRIO <pid> <uid> <prio>
350      // LMK_PROCREMOVE <pid>
351      // LMK_PROCPURGE
352      // LMK_GETKILLCNT
353      // LMK_SUBSCRIBE
354      // LMK_PROCKILL
355      // LMK_UPDATE_PROPS
356      // LMK_KILL_OCCURRED
357      // LMK_STATE_CHANGED
358      static final byte LMK_TARGET = 0;
359      static final byte LMK_PROCPRIO = 1;
360      static final byte LMK_PROCREMOVE = 2;
361      static final byte LMK_PROCPURGE = 3;
362      static final byte LMK_GETKILLCNT = 4;
363      static final byte LMK_SUBSCRIBE = 5;
364      static final byte LMK_PROCKILL = 6; // Note: this is an unsolicited command
365      static final byte LMK_UPDATE_PROPS = 7;
366      static final byte LMK_KILL_OCCURRED = 8; // Msg to subscribed clients on kill occurred event
367      static final byte LMK_STATE_CHANGED = 9; // Msg to subscribed clients on state changed


 public static void setOomAdj(int pid, int uid, int amt) {
       long start = SystemClock.elapsedRealtime();
1508          ByteBuffer buf = ByteBuffer.allocate(4 * 4);
1509          buf.putInt(LMK_PROCPRIO);
1510          buf.putInt(pid);
1511          buf.putInt(uid);
1512          buf.putInt(amt);
1513          writeLmkd(buf, null);
1514          long now = SystemClock.elapsedRealtime();
1515          if ((now-start) > 250) {
1516              Slog.w("ActivityManager", "SLOW OOM ADJ: " + (now-start) + "ms for pid " + pid
1517                      + " = " + amt);
1518          }
1519      }
1520  
 

ams 根据进程状态,更新oom ,这样就会跟lmkd 通信

system/memory/lmkd/lmkd.cpp


7  /*
118   * PSI monitor tracking window size.
119   * PSI monitor generates events at most once per window,
120   * therefore we poll memory state for the duration of
121   * PSI_WINDOW_SIZE_MS after the event happens.
122   */
123  #define PSI_WINDOW_SIZE_MS 1000
124  /* Polling period after PSI signal when pressure is high */
125  #define PSI_POLL_PERIOD_SHORT_MS 10
126  /* Polling period after PSI signal when pressure is low */
127  #define PSI_POLL_PERIOD_LONG_MS 100
128  
129  #define min(a, b) (((a) < (b)) ? (a) : (b))
130  #define max(a, b) (((a) > (b)) ? (a) : (b))
131  
132  #define FAIL_REPORT_RLIMIT_MS 1000
133  
134  /*
135   * System property defaults
136   */
137  /* ro.lmk.swap_free_low_percentage property defaults */
138  #define DEF_LOW_SWAP 10
139  /* ro.lmk.thrashing_limit property defaults */
140  #define DEF_THRASHING_LOWRAM 30
141  #define DEF_THRASHING 100
142  /* ro.lmk.thrashing_limit_decay property defaults */
143  #define DEF_THRASHING_DECAY_LOWRAM 50
144  #define DEF_THRASHING_DECAY 10
145  /* ro.lmk.psi_partial_stall_ms property defaults */
146  #define DEF_PARTIAL_STALL_LOWRAM 200
147  #define DEF_PARTIAL_STALL 70
148  /* ro.lmk.psi_complete_stall_ms property defaults */
149  #define DEF_COMPLETE_STALL 700
150  
151  #define LMKD_REINIT_PROP "lmkd.reinit"
152  
153  /* default to old in-kernel interface if no memory pressure events */
154  static bool use_inkernel_interface = true;
155  static bool has_inkernel_module;
156  
157  /* memory pressure levels */
158  enum vmpressure_level {
159      VMPRESS_LEVEL_LOW = 0,
160      VMPRESS_LEVEL_MEDIUM,
161      VMPRESS_LEVEL_CRITICAL,
162      VMPRESS_LEVEL_SUPER_CRITICAL,
163      VMPRESS_LEVEL_COUNT
164  };


int main(int argc, char **argv) {

3949      if (!init()) {
3950          if (!use_inkernel_interface) {
3951              /*
3952               * MCL_ONFAULT pins pages as they fault instead of loading
3953               * everything immediately all at once. (Which would be bad,
3954               * because as of this writing, we have a lot of mapped pages we
3955               * never use.) Old kernels will see MCL_ONFAULT and fail with
3956               * EINVAL; we ignore this failure.
3957               *
3958               * N.B. read the man page for mlockall. MCL_CURRENT | MCL_ONFAULT
3959               * pins ⊆ MCL_CURRENT, converging to just MCL_CURRENT as we fault
3960               * in pages.
3961               */
3962              /* CAP_IPC_LOCK required */
3963              if (mlockall(MCL_CURRENT | MCL_FUTURE | MCL_ONFAULT) && (errno != EINVAL)) {
3964                  ALOGW("mlockall failed %s", strerror(errno));
3965              }
3966  
3967              /* CAP_NICE required */
3968              struct sched_param param = {
3969                      .sched_priority = 1,
3970              };
3971              if (sched_setscheduler(0, SCHED_FIFO, &param)) {
3972                  ALOGW("set SCHED_FIFO failed %s", strerror(errno));
3973              }
3974          }
3975  
3976          mainloop();
3977      }
3978  
3979      android_log_destroy(&ctx);
3980  
3981      ALOGI("exiting");
3982      return 0;
3983  }

static int init(void) {
3547      static struct event_handler_info kernel_poll_hinfo = { 0, kernel_event_handler };

3552      struct reread_data file_data = {
3553          .filename = ZONEINFO_PATH,
3554          .fd = -1,
3555      };
3556      struct epoll_event epev;
3557      int pidfd;
3558      int i;
3559      int ret;
3560  
3561      page_k = sysconf(_SC_PAGESIZE);
3562      if (page_k == -1)
3563          page_k = PAGE_SIZE;
3564      page_k /= 1024;
3565  
3566      epollfd = epoll_create(MAX_EPOLL_EVENTS);
3567      if (epollfd == -1) {
3568          ALOGE("epoll_create failed (errno=%d)", errno);
3569          return -1;
3570      }
3571  
3572      // mark data connections as not connected
3573      for (int i = 0; i < MAX_DATA_CONN; i++) {
3574          data_sock[i].sock = -1;
3575      }
3576  
3577      ctrl_sock.sock = android_get_control_socket("lmkd");
3578      if (ctrl_sock.sock < 0) {
3579          ALOGE("get lmkd control socket failed");
3580          return -1;
3581      }
3582  
3583      ret = listen(ctrl_sock.sock, MAX_DATA_CONN);
3584      if (ret < 0) {
3585          ALOGE("lmkd control socket listen failed (errno=%d)", errno);
3586          return -1;
3587      }
3588  
3589      epev.events = EPOLLIN;
3590      ctrl_sock.handler_info.handler = ctrl_connect_handler;
3591      epev.data.ptr = (void *)&(ctrl_sock.handler_info);
3592      if (epoll_ctl(epollfd, EPOLL_CTL_ADD, ctrl_sock.sock, &epev) == -1) {
3593          ALOGE("epoll_ctl for lmkd control socket failed (errno=%d)", errno);
3594          return -1;
3595      }
3596      maxevents++;

if (!init_monitors()) {
3618              return -1;
3619          }
3620          /* let the others know it does support reporting kills */
3621          property_set("sys.lmk.reportkills", "1");
3622      }

..........

}
 

  /* default to old in-kernel interface if no memory pressure events */
 static bool use_inkernel_interface = true;
 static bool has_inkernel_module;


  #define INKERNEL_MINFREE_PATH "/sys/module/lowmemorykiller/parameters/minfree"
  #define INKERNEL_ADJ_PATH "/sys/module/lowmemorykiller/parameters/adj"

默认使用以前的mlk 机制,也就是内核lmk 模块低内存杀应用,对应的水位和adj 对应INKERNEL_MINFREE_PATH 和 INKERNEL_ADJ_PATH ,新的lmk 机制是通过内存压力事件psi 来杀应用,这样不存在内核模块对应的adj 和 内存水位文件,这样就会has_inkernel_module 和 use_inkernel_interface  都为false ,使用用户空间lmk ,用户空间lmk内存压力事件上报方式有vmpressure和psi两种方式

psi 

3359  static bool init_monitors() {
3360      /* Try to use psi monitor first if kernel has it */
3361      use_psi_monitors = property_get_bool("ro.lmk.use_psi", true) &&
3362          init_psi_monitors();
3363      /* Fall back to vmpressure */
3364      if (!use_psi_monitors &&
3365          (!init_mp_common(VMPRESS_LEVEL_LOW) ||
3366          !init_mp_common(VMPRESS_LEVEL_MEDIUM) ||
3367          !init_mp_common(VMPRESS_LEVEL_CRITICAL))) {
3368          ALOGE("Kernel does not support memory pressure events or in-kernel low memory killer");
3369          return false;
3370      }
3371      if (use_psi_monitors) {
3372          ALOGI("Using psi monitors for memory pressure detection");
3373      } else {
3374          ALOGI("Using vmpressure for memory pressure detection");
3375      }
3376      return true;
3377  }
3378  


3239  static bool init_psi_monitors() {
3240      /*
3241       * When PSI is used on low-ram devices or on high-end devices without memfree levels
3242       * use new kill strategy based on zone watermarks, free swap and thrashing stats
3243       */
3244      bool use_new_strategy =
3245          property_get_bool("ro.lmk.use_new_strategy", low_ram_device || !use_minfree_levels);
3246  
3247      /* In default PSI mode override stall amounts using system properties */
3248      if (use_new_strategy) {
3249          /* Do not use low pressure level */
3250          psi_thresholds[VMPRESS_LEVEL_LOW].threshold_ms = 0;
3251          psi_thresholds[VMPRESS_LEVEL_MEDIUM].threshold_ms = psi_partial_stall_ms;
3252          psi_thresholds[VMPRESS_LEVEL_CRITICAL].threshold_ms = psi_complete_stall_ms;
3253      }
3254  
3255      if (!init_mp_psi(VMPRESS_LEVEL_LOW, use_new_strategy)) {
3256          return false;
3257      }
3258      if (!init_mp_psi(VMPRESS_LEVEL_MEDIUM, use_new_strategy)) {
3259          destroy_mp_psi(VMPRESS_LEVEL_LOW);
3260          return false;
3261      }
3262      if (!init_mp_psi(VMPRESS_LEVEL_CRITICAL, use_new_strategy)) {
3263          destroy_mp_psi(VMPRESS_LEVEL_MEDIUM);
3264          destroy_mp_psi(VMPRESS_LEVEL_LOW);
3265          return false;
3266      }
3267      return true;
3268  }

208  static struct psi_threshold psi_thresholds[VMPRESS_LEVEL_COUNT] = {
209      { PSI_SOME, 70 },    /* 70ms out of 1sec for partial stall */
210      { PSI_SOME, 100 },   /* 100ms out of 1sec for partial stall */
211      { PSI_FULL, 70 },    /* 70ms out of 1sec for complete stall */
212  };

其中partial stall指的是该时间段内有一个或多个task因为缺少资源而等待,

complete stall指的是该时间段内所有的task都因得不到资源而等待。

psi可以是针对system-wide的,也可以是per-cgroup的。


 

3195  static bool init_mp_psi(enum vmpressure_level level, bool use_new_strategy) {
3196      int fd;
3197  
3198      /* Do not register a handler if threshold_ms is not set */
3199      if (!psi_thresholds[level].threshold_ms) {
3200          return true;
3201      }
3202  

####设置psi 参数 到  /proc/pressure/memory
3203      fd = init_psi_monitor(psi_thresholds[level].stall_type,
3204          psi_thresholds[level].threshold_ms * US_PER_MS,
3205          PSI_WINDOW_SIZE_MS * US_PER_MS);
3206  
3207      if (fd < 0) {
3208          return false;
3209      }
3210  

######监听内核发送满足参数的事件 及 执行函数
3211      vmpressure_hinfo[level].handler = use_new_strategy ? mp_event_psi : mp_event_common;
3212      vmpressure_hinfo[level].data = level;
3213      if (register_psi_monitor(epollfd, fd, &vmpressure_hinfo[level]) < 0) {
3214          destroy_psi_monitor(fd);
3215          return false;
3216      }
3217      maxevents++;
3218      mpevfd[level] = fd;
3219  
3220      return true;
3221  }

将psi 参数写入 /proc/pressure/memory ,并监听这个文件句柄对应的事件

#define PSI_MON_FILE_MEMORY "/proc/pressure/memory"
int init_psi_monitor(enum psi_stall_type stall_type,
39               int threshold_us, int window_us) {
40      int fd;
41      int res;
42      char buf[256];
43  
44      fd = TEMP_FAILURE_RETRY(open(PSI_MON_FILE_MEMORY, O_WRONLY | O_CLOEXEC));
45      if (fd < 0) {
46          ALOGE("No kernel psi monitor support (errno=%d)", errno);
47          return -1;
48      }
49  
50      switch (stall_type) {
51      case (PSI_SOME):
52      case (PSI_FULL):
53          res = snprintf(buf, sizeof(buf), "%s %d %d",
54              stall_type_name[stall_type], threshold_us, window_us);
55          break;
56      default:
57          ALOGE("Invalid psi stall type: %d", stall_type);
58          errno = EINVAL;
59          goto err;
60      }
61  
62      if (res >= (ssize_t)sizeof(buf)) {
63          ALOGE("%s line overflow for psi stall type '%s'",
64              PSI_MON_FILE_MEMORY, stall_type_name[stall_type]);
65          errno = EINVAL;
66          goto err;
67      }
68  
69      res = TEMP_FAILURE_RETRY(write(fd, buf, strlen(buf) + 1));
70      if (res < 0) {
71          ALOGE("%s write failed for psi stall type '%s'; errno=%d",
72              PSI_MON_FILE_MEMORY, stall_type_name[stall_type], errno);
73          goto err;
74      }
75  
76      return fd;
77  
78  err:
79      close(fd);
80      return -1;
81  }

参考

https://www.jianshu.com/p/e01063abe31cPSI - Pressure Stall Information — The Linux Kernel documentationhttps://www.jianshu.com/p/e01063abe31c

vmpressure

3359  static bool init_monitors() {
3360      /* Try to use psi monitor first if kernel has it */
3361      use_psi_monitors = property_get_bool("ro.lmk.use_psi", true) &&
3362          init_psi_monitors();
3363      /* Fall back to vmpressure */
3364      if (!use_psi_monitors &&
3365          (!init_mp_common(VMPRESS_LEVEL_LOW) ||
3366          !init_mp_common(VMPRESS_LEVEL_MEDIUM) ||
3367          !init_mp_common(VMPRESS_LEVEL_CRITICAL))) {
3368          ALOGE("Kernel does not support memory pressure events or in-kernel low memory killer");
3369          return false;
3370      }

.......

}




3270  static bool init_mp_common(enum vmpressure_level level) {
3271      int mpfd;
3272      int evfd;
3273      int evctlfd;
3274      char buf[256];
3275      struct epoll_event epev;
3276      int ret;
3277      int level_idx = (int)level;
3278      const char *levelstr = level_name[level_idx];
3279  
3280      /* gid containing AID_SYSTEM required */
3281      mpfd = open(MEMCG_SYSFS_PATH "memory.pressure_level", O_RDONLY | O_CLOEXEC);
3282      if (mpfd < 0) {
3283          ALOGI("No kernel memory.pressure_level support (errno=%d)", errno);
3284          goto err_open_mpfd;
3285      }
3286  
3287      evctlfd = open(MEMCG_SYSFS_PATH "cgroup.event_control", O_WRONLY | O_CLOEXEC);
3288      if (evctlfd < 0) {
3289          ALOGI("No kernel memory cgroup event control (errno=%d)", errno);
3290          goto err_open_evctlfd;
3291      }
3292  
3293      evfd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
3294      if (evfd < 0) {
3295          ALOGE("eventfd failed for level %s; errno=%d", levelstr, errno);
3296          goto err_eventfd;
3297      }
3298  
3299      ret = snprintf(buf, sizeof(buf), "%d %d %s", evfd, mpfd, levelstr);
3300      if (ret >= (ssize_t)sizeof(buf)) {
3301          ALOGE("cgroup.event_control line overflow for level %s", levelstr);
3302          goto err;
3303      }
3304  
3305      ret = TEMP_FAILURE_RETRY(write(evctlfd, buf, strlen(buf) + 1));
3306      if (ret == -1) {
3307          ALOGE("cgroup.event_control write failed for level %s; errno=%d",
3308                levelstr, errno);
3309          goto err;
3310      }
3311  
3312      epev.events = EPOLLIN;
3313      /* use data to store event level */
3314      vmpressure_hinfo[level_idx].data = level_idx;
3315      vmpressure_hinfo[level_idx].handler = mp_event_common;
3316      epev.data.ptr = (void *)&vmpressure_hinfo[level_idx];
3317      ret = epoll_ctl(epollfd, EPOLL_CTL_ADD, evfd, &epev);
3318      if (ret == -1) {
3319          ALOGE("epoll_ctl for level %s failed; errno=%d", levelstr, errno);
3320          goto err;
3321      }
3322      maxevents++;
3323      mpevfd[level] = evfd;
3324      close(evctlfd);
3325      return true;
3326  

}

/dev/memcg/memory.pressure_level

/dev/memcg/cgroup.event_control

向/dev/memcg/cgroup.event_control节点写入"evfd mpfd levelstr"即可注册监听,

其中evfd告诉内核事件发生后通知谁,mpfd表示监听的是什么事件(这里为memory.pressure_level),levelstr表示监听内存压力级别,可以是low,medium,critical

监听到事件调用  mp_event_common

static void mp_event_common(int data, uint32_t events, struct polling_params *poll_params) {
2960      unsigned long long evcount;
2961      int64_t mem_usage, memsw_usage;
2962      int64_t mem_pressure;
2963      union meminfo mi;
2964      struct zoneinfo zi;
2965      struct timespec curr_tm;
2966      static unsigned long kill_skip_count = 0;
2967      enum vmpressure_level level = (enum vmpressure_level)data;
2968      long other_free = 0, other_file = 0;
2969      int min_score_adj;
2970      int minfree = 0;
2971      static struct reread_data mem_usage_file_data = {
2972          .filename = MEMCG_MEMORY_USAGE,
2973          .fd = -1,
2974      };
2975      static struct reread_data memsw_usage_file_data = {
2976          .filename = MEMCG_MEMORYSW_USAGE,
2977          .fd = -1,

####ro.lmk.use_minfree_levels    这里确定是否用之前的 lmkd  水位和对应的 adj 杀应用
 if (use_minfree_levels) {
3053          int i;
3054  
3055          other_free = mi.field.nr_free_pages - zi.totalreserve_pages;
3056          if (mi.field.nr_file_pages > (mi.field.shmem + mi.field.unevictable + mi.field.swap_cached)) {
3057              other_file = (mi.field.nr_file_pages - mi.field.shmem -
3058                            mi.field.unevictable - mi.field.swap_cached);
3059          } else {
3060              other_file = 0;
3061          }
3062  
3063          min_score_adj = OOM_SCORE_ADJ_MAX + 1;
3064          for (i = 0; i < lowmem_targets_size; i++) {
3065              minfree = lowmem_minfree[i];
3066              if (other_free < minfree && other_file < minfree) {
3067                  min_score_adj = lowmem_adj[i];
3068                  break;
3069              }
3070          }
3071  
3072          if (min_score_adj == OOM_SCORE_ADJ_MAX + 1) {
3073              if (debug_process_killing) {
3074                  ALOGI("Ignore %s memory pressure event "
3075                        "(free memory=%ldkB, cache=%ldkB, limit=%ldkB)",
3076                        level_name[level], other_free * page_k, other_file * page_k,
3077                        (long)lowmem_minfree[lowmem_targets_size - 1] * page_k);
3078              }
3079              return;
3080          }
3081  
3082          goto do_kill;
3083      }


    if (level == VMPRESS_LEVEL_LOW) {
3086          record_low_pressure_levels(&mi);
3087      }
3088  
3089      if (level_oomadj[level] > OOM_SCORE_ADJ_MAX) {
3090          /* Do not monitor this pressure level */
3091          return;
3092      }
3093  
3094      if ((mem_usage = get_memory_usage(&mem_usage_file_data)) < 0) {
3095          goto do_kill;
3096      }
3097      if ((memsw_usage = get_memory_usage(&memsw_usage_file_data)) < 0) {
3098          goto do_kill;
3099      }
3100  
3101      // Calculate percent for swappinness.
3102      mem_pressure = (mem_usage * 100) / memsw_usage;
3103  
3104      if (enable_pressure_upgrade && level != VMPRESS_LEVEL_CRITICAL) {
3105          // We are swapping too much.
3106          if (mem_pressure < upgrade_pressure) {
3107              level = upgrade_level(level);
3108              if (debug_process_killing) {
3109                  ALOGI("Event upgraded to %s", level_name[level]);
3110              }
3111          }
3112      }
3113  
3114      // If we still have enough swap space available, check if we want to
3115      // ignore/downgrade pressure events.
3116      if (mi.field.free_swap >=
3117          mi.field.total_swap * swap_free_low_percentage / 100) {
3118          // If the pressure is larger than downgrade_pressure lmk will not
3119          // kill any process, since enough memory is available.
3120          if (mem_pressure > downgrade_pressure) {
3121              if (debug_process_killing) {
3122                  ALOGI("Ignore %s memory pressure", level_name[level]);
3123              }
3124              return;
3125          } else if (level == VMPRESS_LEVEL_CRITICAL && mem_pressure > upgrade_pressure) {
3126              if (debug_process_killing) {
3127                  ALOGI("Downgrade critical memory pressure");
3128              }
3129              // Downgrade event, since enough memory available.
3130              level = downgrade_level(level);
3131          }
3132      }
3133  
3134  do_kill:
3135      if (low_ram_device) {
3136          /* For Go devices kill only one task */
3137          if (find_and_kill_process(level_oomadj[level], NULL, &mi, &wi, &curr_tm) == 0) {
3138              if (debug_process_killing) {
3139                  ALOGI("Nothing to kill");
3140              }
3141          }
3142      } else {
3143          int pages_freed;
3144          static struct timespec last_report_tm;
3145          static unsigned long report_skip_count = 0;
3146  
3147          if (!use_minfree_levels) {
3148              /* Free up enough memory to downgrate the memory pressure to low level */
3149              if (mi.field.nr_free_pages >= low_pressure_mem.max_nr_free_pages) {
3150                  if (debug_process_killing) {
3151                      ALOGI("Ignoring pressure since more memory is "
3152                          "available (%" PRId64 ") than watermark (%" PRId64 ")",
3153                          mi.field.nr_free_pages, low_pressure_mem.max_nr_free_pages);
3154                  }
3155                  return;
3156              }
3157              min_score_adj = level_oomadj[level];
3158          }
3159  
3160          pages_freed = find_and_kill_process(min_score_adj, NULL, &mi, &wi, &curr_tm);
3161  
3162          if (pages_freed == 0) {
3163              /* Rate limit kill reports when nothing was reclaimed */
3164              if (get_time_diff_ms(&last_report_tm, &curr_tm) < FAIL_REPORT_RLIMIT_MS) {
3165                  report_skip_count++;
3166                  return;
3167              }
3168          }
3169  
3170          /* Log whenever we kill or when report rate limit allows */
3171          if (use_minfree_levels) {
3172              ALOGI("Reclaimed %ldkB, cache(%ldkB) and free(%" PRId64 "kB)-reserved(%" PRId64 "kB) "
3173                  "below min(%ldkB) for oom_score_adj %d",
3174                  pages_freed * page_k,
3175                  other_file * page_k, mi.field.nr_free_pages * page_k,
3176                  zi.totalreserve_pages * page_k,
3177                  minfree * page_k, min_score_adj);
3178          } else {
3179              ALOGI("Reclaimed %ldkB at oom_score_adj %d", pages_freed * page_k, min_score_adj);
3180          }
3181  
3182          if (report_skip_count > 0) {
3183              ALOGI("Suppressed %lu failed kill reports", report_skip_count);
3184              report_skip_count = 0;
3185          }
3186  
3187          last_report_tm = curr_tm;
3188      }
3189      if (is_waiting_for_kill()) {
3190          /* pause polling if we are waiting for process death notification */
3191          poll_params->update = POLLING_PAUSE;
3192      }
3193  }

事件通知

PSI

使用psi_memstall_enter和psi_memstall_leave包裹相关内存操作,实现事件信息统计。这些操作有以下几种:

kernel/msm-4.14/mm/vmscan.c

在try_to_free_mem_cgroup_pages函数中,调用do_try_to_free_pages进行内存回收的时候

kswapd在调用balance_pgdat进行内存回收的时候

kernel/msm-4.14/mm/compaction.c

kcompactd函数在进行内存压缩的时候

kernel/msm-4.14/mm/page_alloc.c

调用__alloc_pages_direct_compact进行内存压缩的时候

调用__perform_reclaim进行内存回收的时候

kernel/msm-4.14/mm/filemap.c

调用wait_on_page_locked,等待文件页就绪的时候

//满足注册条件后开始上报

vmpressure

通过以下几条路径触发事件的上报:

kernel/msm-4.14/mm/vmscan.c:

//分配物理页时,水位不满足要求,触发内存回收时上报

__alloc_pages ->__alloc_pages_nodemask -> get_page_from_freelist

-> node_reclaim -> __node_reclaim -> shrink_node -> vmpressure

//分配物理页时,进入slow path,进行直接内存回收的过程中上报

__alloc_pages -> __alloc_pages_nodemask -> __alloc_pages_slowpath

-> __alloc_pages_direct_reclaim -> __perform_reclaim -> try_to_free_pages

-> do_try_to_free_pages -> shrink_zones -> shrink_node -> vmpressure

//回收cgroup内存的过程中上报

try_to_free_mem_cgroup_pages -> do_try_to_free_pages -> shrink_zones

-> shrink_node -> vmpressure

//分配物理页时,进入slow path,唤醒kswapd,kswapd在回收内存的过程中上报

kswapd -> balance_pgdat -> kswapd_shrink_node ->  shrink_node -> vmpressure

//vmpressure中进行事件上报

其中存在变量 vmpressure_win = SWAP_CLUSTER_MAX*16;

内核psi 实现


kernel-4.19/kernel/sched/psi.c 

1287  static int __init psi_proc_init(void)
1288  {
1289      proc_mkdir("pressure", NULL);
1290      proc_create("pressure/io", 0, NULL, &psi_io_fops);
1291      proc_create("pressure/memory", 0, NULL, &psi_memory_fops);
1292      proc_create("pressure/cpu", 0, NULL, &psi_cpu_fops);
1293      return 0;
1294  }
1295  module_init(psi_proc_init);


参考
https://cloud.tencent.com/developer/article/2003334

内核vmpressure

原生Linux内核是把vmpressure和CONFIG_MEMCG绑定的,是给用户态用的。

参考

linux里vm是什么,理解Linux VM pressure_星尤野的博客-CSDN博客

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值