perf性能调优

最新推荐文章于 2024-08-23 00:04:54 发布

波波敲代码

最新推荐文章于 2024-08-23 00:04:54 发布

阅读量1.5k

点赞数 1

分类专栏：开发管理

本文链接：https://blog.csdn.net/u012129163/article/details/91972851

版权

开发管理专栏收录该内容

3 篇文章 0 订阅

订阅专栏

perf top实时显示系统/进程的统计信息

perf record/report记录一段时间内系统/进程的性能事件

参考

性能调优

性能调优是指在处理器或操作系统中，对可能影响到程序性能的硬件或软件事件进行调优。主要涉及到如下几个方面：

1、评估硬件资源使用情况，例如各级cache的访问次数，各级cache的丢失次数、流水线停顿周期、前端总线访问次数等。

2、评估操作系统对资源的使用情况，系统调用次数、上下文切换次数、任务迁移次数。

3、算法优化（空间复杂度、时间复杂度）、代码优化（提到执行速度、减少内存占用）

Perf分析

perf是以时间点触发事件采样获取程序运行的时间分布。主要针对以下三种事件

1、Hardware Event 是由 PMU 硬件产生的事件，比如 cache 命中，当您需要了解程序对硬件特性的使用情况时，便需要对这些事件进行采样；

2、Software Event 是内核软件产生的事件，比如进程切换，tick 数等 ;

3、Tracepoint event 是内核中的静态 tracepoint 所触发的事件，这些 tracepoint 用来判断程序运行期间内核的行为细节，比如 slab 分配器的分配次数等。

可以用perf list 查看

List of pre-defined events (to be used in -e):
  cpu-cycles OR cycles                               [Hardware event]处理器周期事件
  stalled-cycles-frontend OR idle-cycles-frontend    [Hardware event]
  stalled-cycles-backend OR idle-cycles-backend      [Hardware event]
  instructions                                       [Hardware event]
  cache-references                                   [Hardware event]
  cache-misses                                       [Hardware event]
  branch-instructions OR branches                    [Hardware event]
  branch-misses                                      [Hardware event]
  bus-cycles                                         [Hardware event]
 
  cpu-clock                                          [Software event]
  task-clock                                         [Software event]
  page-faults OR faults                              [Software event]
  minor-faults                                       [Software event]
  major-faults                                       [Software event]
  context-switches OR cs                             [Software event]
  cpu-migrations OR migrations                       [Software event]
  alignment-faults                                   [Software event]
  emulation-faults                                   [Software event]
 
  L1-dcache-loads                                    [Hardware cache event]
  L1-dcache-load-misses                              [Hardware cache event]
  L1-dcache-stores                                   [Hardware cache event]
  L1-dcache-store-misses                             [Hardware cache event]
  L1-dcache-prefetches                               [Hardware cache event]
  L1-dcache-prefetch-misses                          [Hardware cache event]
  L1-icache-loads                                    [Hardware cache event]
  L1-icache-load-misses                              [Hardware cache event]
  L1-icache-prefetches                               [Hardware cache event]
  L1-icache-prefetch-misses                          [Hardware cache event]
  LLC-loads                                          [Hardware cache event]
  LLC-load-misses                                    [Hardware cache event]
  LLC-stores                                         [Hardware cache event]
  LLC-store-misses                                   [Hardware cache event]
  LLC-prefetches                                     [Hardware cache event]
  LLC-prefetch-misses                                [Hardware cache event]
  dTLB-loads                                         [Hardware cache event]
  dTLB-load-misses                                   [Hardware cache event]
  dTLB-stores                                        [Hardware cache event]
  dTLB-store-misses                                  [Hardware cache event]
  dTLB-prefetches                                    [Hardware cache event]
  dTLB-prefetch-misses                               [Hardware cache event]
  iTLB-loads                                         [Hardware cache event]
  iTLB-load-misses                                   [Hardware cache event]
  branch-loads                                       [Hardware cache event]
  branch-load-misses                                 [Hardware cache event]

Perf使用

perf COMMAND [-e event ...] PROGRAM

COMMAND包含top, stat, record, report，-e 参数包含perf关心的事件，多个事件要用多个-e连接

perf stat 分析程序整体性能

对待执行程序test，执行如下指令

root@jiuling-MS-7885-Invalid-entry-length-16-Fixed-up-to-11:/home/zhoutengbo/core/MakefileTest# perf stat test

 Performance counter stats for 'test':

          0.476929      task-clock (msec)         #    0.011 CPUs utilized          
                 1      context-switches          #    0.002 M/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
                55      page-faults               #    0.115 M/sec                  
         1,701,478      cycles                    #    3.568 GHz                    
           800,978      instructions              #    0.47  insn per cycle         
           158,665      branches                  #  332.681 M/sec                  
             7,568      branch-misses             #    4.77% of all branches        

       0.042206544 seconds time elapsed

task-clock - 任务真正占用处理器时间【0-1】，越接近1说明越占用cpu

context-switches - 上下文切换次数

cpu-migrations - 处理器迁移，任务从一个cpu转移到另一个cpu

page-faults - 缺页异常，当应用程序请求的数据不再内存中发生

cycles - 任务消耗的处理器周期数；处理器时钟，一条机器指令可能需要多个 cycles；

instructions - 任务执行期间产生的处理器指令数，IPC（instructions perf cycle）

IPC（Instructions/Cycles ）是评价处理器与应用程序性能的重要指标。（很多指令需要多个处理周期才能执行完毕），

IPC越大越好，说明程序充分利用了处理器的特征。

perf top实时显示系统/进程的统计信息

常用参数如下

-e,指定性能事件

-p,指定分析进程的PID

-t,指定待分析线程的TID

-r N，连续分析N次

-d,全面性能分析,采用更多的性能事件

如要实时显示某个进程的使用情况可以

root@zhoutengbo:/home/zhoutengbo/# perf top -e cpu-clock -p 3217

Samples: 10K of event 'cpu-clock', Event count (approx.): 279296414                                                                                                                                      
Overhead  Shared Object       Symbol                                                                                                                                                                     
   4.82%  [kernel]            [k] __lock_text_start
   1.80%  [kernel]            [k] finish_task_switch
   1.72%  [kernel]            [k] do_syscall_64
   1.70%  perf-3217.map       [.] 0x00007f83ea30ad67
   1.45%  libpthread-2.23.so  [.] pthread_cond_timedwait@@GLIBC_2.3.2
   0.78%  libjvm.so           [.] YoungList::rs_length_sampling_next
   0.77%  libc-2.23.so        [.] __GI___writev
   0.72%  libc-2.23.so        [.] epoll_ctl
   0.70%  perf-3217.map       [.] 0x00007f83ea30ad6b
   0.70%  libc-2.23.so        [.] 0x0000000000107a13
   0.68%  libjvm.so           [.] Monitor::ILock
   0.63%  libpthread-2.23.so  [.] __lll_unlock_wake
   0.62%  [kernel]            [k] nf_conntrack_in
   0.58%  libpthread-2.23.so  [.] pthread_cond_wait@@GLIBC_2.3.2
   0.54%  libjvm.so           [.] G1CollectorPolicy::predict_bytes_to_copy
   0.52%  perf-3217.map       [.] 0x00007f83ea30b2f9
   0.51%  perf-3217.map       [.] 0x00007f83ea30b279
   0.49%  [kernel]            [k] tcp_in_window
   0.48%  [vdso]              [.] __vdso_clock_gettime
   0.46%  libjvm.so           [.] OtherRegionsTable::occupied
   0.46%  perf-3217.map       [.] 0x00007f83ea30adc5
   0.43%  [kernel]            [k] copy_user_enhanced_fast_string
   0.42%  libc-2.23.so        [.] __clock_get

perf列出了所有可能的cpu消耗点以及所占用的百分比，如果要查看程序具体消耗情况，可以加上-g，前提是程序编译时支持-g选项。

perf record/report记录一段时间内系统/进程的性能事件

record命令用来记录一段时间内的程序性能事件，会在本地生成perf.data，report用于读取perf.data，-i 指定路径

root@zhoutengbo:/home/zhoutengbo/# perf record -e cpu-clock ./test 
123
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.012 MB perf.data (2 samples) ]

root@zhoutengbo:/home/zhoutengbo/# perf report

Samples: 2  of event 'cpu-clock', Event count (approx.): 500000                                                                                                                                          
Overhead  Command  Shared Object      Symbol                                                                                                                                                             
  50.00%  test     [kernel.kallsyms]  [k] filemap_map_pages
  50.00%  test     [kernel.kallsyms]  [k] kmem_cache_alloc