Linux调度器性能分析 - 1

This article was firstly published from http://oliveryang.net. The content reuse need include the original link.

1. Scheduling perf profiling goals

For an OS scheduler implementation, there are 3 key features,

  • Time Sharing

    Let multiple tasks share the CPU time fairly and efficiently.

  • Preemption

    Important or latency sensitive tasks could be scheduled as quick as possible.

  • Load balance

    Allow multiple tasks to share multiple CPU resources in system wide fairly and efficiently.

If system ran into any CPU scheduling perf problems, we would see one of above features might get broken. Our scheduling perf profiling goal is to understand how scheduler behaves from these 3 perspectives, under a certain workload or benchmark.

2. The major symptoms of scheduling perf issues

The symptoms of scheduling perf issues could be also classified by above 3 perspectives,

  • High or Low CPU utilization

  • Big task scheduling latency

  • Imbalance CPU utilization or scheduling latency

Please note that above symptoms might not always be caused by a kernel scheduler bug. For this reason, the most important thing is, we must define the performance problem with a clear baseline. With a clear baseline, we could have better efficiency to rule out different possibilities which have the similar symptoms.

3. The scheduling perf issues triage process

Different issues from hardware, kernel, or application level could cause the similar symptoms. For example, I used to handle the CPU high utilization bug caused by wrong MTRR(Memory Type Range Register) setting.

In another case, the scheduling domain workload imbalance was caused by a buggy ACPI SART table. In my examples, these issues might be easily identified by CPI(cycle per instruction) number reported by Linux perf or NUMAtop tools. However, if the problems comes from kernel or application, it can be very difficult to get the root cause, when we do not have the enough knowledge for that specific components.

As we always reported perf scheduling issues from specific type of workload or benchmark testing. The most efficient order
to triage scheduling performance bug is from top to bottom.

application -> kernel -> hypervisor -> hardware

One issues move from one layer to next layer, we must have technical justifications with following information,

  • The clear problem definitions with clear performance baseline
  • Why we think the problem is not in this layer
  • The performance tracing data or logs that support your analysis

转载于:https://www.cnblogs.com/ainima/p/6330789.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值