Completely Fairness Scheduler is a task scheduler merged into the 2.6.23 release of the Linux kernel. It handles CPU resource allocation for executing processes and aims to maximize overall CPU utilization while also maximizing interactive performance.
CFS normally operates on individual tasks. It maintains the fairness in providing processor time to tasks by running every task with equal speed in paralell. The tasks may not be assigned the identical execution time, but they will have the same completion ratio. It may be desirable to group tasks and provide fair CPU time to each task group.
If we consider a simple situation that there are n tasks with the same weighting on the CPU and only one task can be run at a time, each task gets 1/n runtime.
These tasks are queued through a red black tree by the key called virtual runtime, denoted as vruntime.
if curr.nice!=NICE_0_LOAD, where delta is the actual runtime of the task in a scheduling period. NICE_0_LOAD is the load of the process when nice=0. The value of nice affects the priority of the process. The term a->se.load is the weighting of the current process. The factor vruntime is proportional to delta, where delta increases, vruntime increases and inversely proportional to the weighting of the process. To calculate the value of delta, we use the following function:
delta = curr->sum_exec_runtime - curr->prev_sum_exec_runtime
The value of ideal_runtime denoted the ideal runtime for the current process in a scheduling preiord in which all the tasks will be scheduled and called once.
where sum_runtimeis the scheduling period. The parametersum_runtimeis related to the number of running tasks.
if the running tasks < 5,
else,
noted that the minimum granularity is 4ms. The parametercfs_rq.weight is the total weighting for all the tasks. Delta is obtained from the total runtime subtracted by the previous total runtime. Since CFS is designed in granularity. So deltais updated every 4ms. If delta exceeds the ideal runtime, the task need to be rescheduled which means it has run out of its time in this period.
For the initialization of a new process, the functionwake_up_new_task(struct task_struct *p, unsigned long clone_flags) changes the parameters of the CFS. It wakes up a task p and put it on the run queue. If the task p is not newly created or task p is not the current process, the function activates p byactivate_task(rq, p, 0). Otherwise it lets the scheduling class to do a new task startup byp->sched_class->task_new(rq, p).The function task_new_fair share the fairness runtime between the tasks, which updates the information of thecurrent process in update_curr() and the current virtual runtime inplace_entity() in order to insert it into the red black tree, noted that new tasks get a chance to run but not allowed to monopolize the CPU by assigning the virtual runtime with min_vruntime. Afterwards, the function check_preempt_wakeup(struct rq *rq, struct task_struct *p, int sync) checks whether the newly woken task preempt the current task by comparing their virtual runtime.
The function void __sched schedule(void) controls the scheduling algorithm of the CFS. It checks out the status of the tasks. Only running tasks are put into the red black tree, with state=0. Before scheduling a new process instead of the previous one, we need to update all the changes in the CFS regarding to the previous process, such as the virtual runtime of the previous process and re-insertion into the red black tree if theprocess is still in running state by performingprev->sched_class->put_prev_task(rq, prev). The next process will be selected and assigned as the current processcurr with next = pick_next_task(rq, prev).
The paragraphs above briefly conclude the simple running steps of a CFS. There are many factors to be considered in the functions such as the environment, type of cpu, and task groups.