Performance Analysis Methodology 文章出处http://write.blog.csdn.net/postedit

最新推荐文章于 2021-10-20 07:01:32 发布

蔡大东

最新推荐文章于 2021-10-20 07:01:32 发布

阅读量670

点赞数

分类专栏：学习方法

学习方法专栏收录该内容

2 篇文章 0 订阅

订阅专栏

Performance Analysis Methodology

A performance analysis methodology is a procedure that you can follow to analyze system or application performance. These generally provide a starting point and then guidance to root cause, or causes. Different methodologies are suited for solving different classes of issues, and you may try more than one before accomplishing your goal.

These methodologies can help you solve issues quickly, and solve a wider range of issues. Analysis without a methodology can become a fishing expedition, where tools and metrics are examined ad hoc, until the issue is found – if it is at all.

Key Methodologies:

The USE Method: for finding resource bottlenecks
The TSA Method: for analyzing application time
Off-CPU Analysis: for analyzing any type of thread wait latency
Active Benchmarking: for accurate and successful benchmarking

This is my main page of performance analysis methodologies, and contains links and summaries of different methods.

Summaries

I first summarized various methodologies for my USENIX LISA2012 talk "Performance Analysis Methodology" (PDF, slideshare, youtube,USENIX), then later documented them in my Systems Performance book. The following is my most up to date summary list, with methodologies enumerated.

These begin with anti-methods, which are included for comparison, and not to follow. You can print these all out as a cheetsheet/reminder.

Blame-Someone-Else Anti-Method

Find a system or environment component you are not responsible for
Hypothesize that the issue is with that component
Redirect the issue to the responsible team
When proven wrong, go to 1

Streetlight Anti-Method

Pick observability tools that are:
- familiar
- found on the Internet
- found at random
Run tools
Look for obvious issues

Drunk Man Anti-Method

Change things at random until the problem goes away

Random Change Anti-Method

Measure a performance baseline
Pick a random attribute to change (eg, a tunable)
Change it in one direction
Measure performance
Change it in the other direction
Measure performance
Were the step 4 or 6 results better than the baseline? If so, keep the change; of not, revert
Goto step 1

Passive Benchmarking Anti-Method

Pick a benchmark tool
Run it with a variety of options
Make a slide deck of the results
Hand the slides to management

Ad Hoc Checklist Method

..N. Run A, if B, do C

Problem Statement Method

What makes you think there is a performance problem?
Has this system ever performed well?
What has changed recently? (Software? Hardware? Load?)
Can the performance degradation be expressed in terms of latency or run time?
Does the problem affect other people or applications (or is it just you)?
What is the environment? What software and hardware is used? Versions? Configuration?

Scientific Method

Question
Hypothesis
Prediction
Test
Analysis

Workload Characterization Method

Who is causing the load? PID, UID, IP addr, ...
Why is the load called? code path
What is the load? IOPS, tput, type
How is the load changing over time?

Drill-Down Analysis Method

Start at highest level
Examine next-level details
Pick most interesting breakdown
If problem unsolved, go to 2

By-Layer Method

Measure latency from:

Dynamic languages
Executable
Libraries
Syscalls
Kernel: FS, network
Device drivers

Latency Analysis Method

Measure operation time (latency)
Divide into logical synchronous components
Continue division until latency origin is identified
Quantify: estimate speedup if problem fixed

Tools Method

List available performance tools (optionally add more)
For each tool, list its useful metrics
For each metric, list possible interpretation
Run selected tools and interpret selected metrics.

USE Method

For every resource, check:

Utilization
Saturation
Errors

Stack Profile Method

Profile thread stack traces, on- and off-CPU
Coalesce
Study stacks bottom-up

Off-CPU Analysis

Profile scheduler per-thread off-CPU time with stack traces
Coalesce times with like stacks
Study stacks from largest to shortest time

TSA Method

For each thread of interest, measure time in operating system thread states. Eg:
- Executing
- Runnable
- Swapping
- Sleeping
- Lock
- Idle
Investigate states from most to least frequent, using appropriate tools