debug and tunning

linux debugging and performance tunning


LTT linux trace toolkits for summery


date;ps;date


time 命令可以用来测量实际运行的时间


gettimeofday return seconds and microseconds


profile就是用来看花费在subroutine或者func上面的时间


grof

▪Beforeprograms can be profiled using gprof, they must be compiled with the -pg gcc option.

$ gcc –pg –o sample1sample1.c

▪Thesample1 program prints the prime numbers up to 50,000.
▪Whenthe sample1 program is run, the gmon.out file is created
▪$gprof-b ./sample1
–The -b optioneliminates the text output that explains the data output provided by gprof
▪Youcan use the output from gprof to increase this program's performance by changingthe code to perform faster

#include<stdlib.h>

#include<stdio.h>

int prime (int num);

 int main()

 {

    int i;

    int colcnt = 0;

    for (i=2; i <= 50000; i++)

         if (prime(i)) {

  colcnt++;

       if (colcnt%9 == 0) {

      printf("%5d\n",i);

      colcnt =0;

          } else

            printf("%5d", i);

         }

    putchar('\n');

    return 0;

}

 intprime (int num)

 {

    /* check to see if the number is a prime? */

    inti;

    for (i=2; i < num; i++)

         if (num %i == 0)

  return 0;

         return 1;

 }

▪Nextwe can use the gcov program to look at the actual number of times eachline of the program was executed (See Chapter 2)
▪Buildthe sample1 program with two additional options :

$ gcc -pg -fprofile-arcs -ftest-coverage -o sample1sample1.c

▪Runningsample1 and creating gcov output
▪./sample1
▪gcov sample1.c
▪Runninggcovon the source code produces the file sample1.c.gcov. It shows the actual number of times each line ofthe program was executed


oprofile tools

▪A performancecounter is the part of a microprocessor that measures and gathersperformance-relevant events on the microprocessor.
▪The numberand type of available events differ significantly between existingmicroprocessors.
▪These counters imposeno overheadon the system
▪One performance area ofconcern is cache misses.The following section describes different types of coding areas that can causecache misses


▪Cache missesare costly, try to minimize by following suggestions:
▪Keep frequentlyaccessed data together.
▪Storeand access frequently used data in flat, sequential data structuresand avoid pointer indirection.
▪Access data sequentially.
▪If theprogram is accessing data sequentially, each cache miss brings in n words. Ifthe program is accessing only nth word, it brings in unneeded data, degradingperformance.
▪Avoid simultaneously traversing several large buffers of data
▪Therecan be cache conflicts between the buffers. Instead, pack the contentssequentially into one buffer whenever possible. If you are using vertex arrays,try to use interleaved arrays.
▪Some framebuffers have cache-likebehaviors as well.
▪It isa good idea to group geometry so that the drawing is done to one part of thescreen at a time.


Padding

▪Some compilers (or compiler options) automatically pad structures.
▪Referencinga data structure that spans two cache blocks may incur two misses, even if thestructure itself is smaller than the block size.
▪Paddingstructures to a multiple of the block size and aligning them on a blockboundary can eliminate these "misalignment" misses


Aligning

▪Alignment is a little more difficult, since the structure'saddress must be a multiple of the cache block size.
▪Aligningstatically declared structures generally requires compiler support. Some versions of malloc() return cache block aligned memory
▪Theprogrammer can align dynamically allocated structures using simple pointerarithmetic



Packing

▪Packing is the opposite of padding
▪By packingan array into the smallest space possible, the programmer increases locality,which can reduce both conflict and capacitymisses.


Loop Grouping

▪Numericprograms often consist of several operations on the same data, coded as multipleloops over the same arrays
▪Combining these loops may increases the program's temporallocality and frequently reduces the number of capacity misses

ophelp

▪Intel ArchitectureOptimization Reference Manual - http://developer.intel.com/design/pentiumii/manuals/245127.htm
▪Intel ArchitectureSoftware Developer's Manuals IA-32 - http://www.intel.com/design/pentium4/manuals/index_new.htm
▪Processor ReferenceManual for Software Development and Optimization - ftp://download.intel.com/design/Itanium2/manuals/25111003.pdf




code optimize





requires: summery on all debug tools

cache size? 16K?




  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值