1. System calls related to the numa balance
do_mbind
Binds the policy users specified with a vma, so later physical memory allocation will comply with the policy, e.g. allocating memory from a certain node. Then queue_pages_range and migrate_pages to migrate a list of pages of the vma that was allocated from nodes other than the policy allows (depends on flags).
set_mempolicy
set memory policy without moving pages immediately. pages that don’t obey to the policy will be moved after being unmapped by next task_numa_work run.
migrate_pages
2. task_tick_numa to set the numa fault flags: task_numa_work
Added to task work queue from timer interrupt, task_tick_fair, and executed when the task is about to return to user space with the task context - right before evaluating signals. Calls change_prot_numa to clear the entire vma's ptes' _PAGE_PRESENT bit and set _PAGE_PROTNONE bit. So next time a page in this set is accessed, a page fault is generated.
useful when mbind doesn't migrate existing pages synchronously/ cpuset changes/ set_mempolicy called/ move_pages/ migrate_pages, it also triggers task_numa_fault to calculate numa fault stats without initiating from userspace
3. do_numa_page
When a page fault is generated and the corresponding pte has _PAGE_PROTNONE set, this will be called to migrate a page to some other nodes that the policy prefers and will also try to migrate the process to the preferred CPU or exchange with other process on the preferred CPU if the capacity is available and loading is balanced on both CPUs.
do_numa_page
numa_migrate_prep → check mpol to see if the page needs to be migrated
mpol_misplaced
migrate_misplaced_page → migrate page if numa_migrate_prep tells to do so
task_numa_fault → calculate max_faults to find the preferred node id to migrate.3
task_numa_placement
numa_migrate_preferred
task_numa_migrate: What’s the difference between the max_faults calculation in the task_numa_placement?
task_numa_find_cpu
task_numa_compare
Note,
can_migrate_task also takes numa locality into consideration when load balancing cpu within domain