debug and tunning

最新推荐文章于 2024-07-24 16:59:00 发布

zhangqingsup

最新推荐文章于 2024-07-24 16:59:00 发布

阅读量419

点赞数

分类专栏： Linux OS linux kernel 文章标签： performance structure compiler cache subroutine optimization

本文链接：https://blog.csdn.net/zhangqingsup/article/details/6863783

版权

linux kernel 同时被 2 个专栏收录

44 篇文章 0 订阅

订阅专栏

Linux OS

22 篇文章 0 订阅

订阅专栏

linux debugging and performance tunning

LTT linux trace toolkits for summery

date;ps;date

time 命令可以用来测量实际运行的时间

gettimeofday return seconds and microseconds

profile就是用来看花费在subroutine或者func上面的时间

grof

▪Beforeprograms can be profiled using gprof, they must be compiled with the -pg gcc option.

$ gcc –pg –o sample1sample1.c

▪Thesample1 program prints the prime numbers up to 50,000.

▪Whenthe sample1 program is run, the gmon.out file is created

▪$gprof-b ./sample1

–The -b optioneliminates the text output that explains the data output provided by gprof

▪Youcan use the output from gprof to increase this program's performance by changingthe code to perform faster

#include<stdlib.h>

#include<stdio.h>

int prime (int num);

int main()

{

int i;

int colcnt = 0;

for (i=2; i <= 50000; i++)

if (prime(i)) {

colcnt++;

if (colcnt%9 == 0) {

printf("%5d\n",i);

colcnt =0;

} else

printf("%5d", i);

}

putchar('\n');

return 0;

}

intprime (int num)

{

/* check to see if the number is a prime? */

inti;

for (i=2; i < num; i++)

if (num %i == 0)

return 0;

return 1;

}

▪Nextwe can use the gcov program to look at the actual number of times eachline of the program was executed (See Chapter 2)

▪Buildthe sample1 program with two additional options :

$ gcc -pg -fprofile-arcs -ftest-coverage -o sample1sample1.c

▪Runningsample1 and creating gcov output

▪./sample1

▪gcov sample1.c

▪

▪Runninggcovon the source code produces the file sample1.c.gcov. It shows the actual number of times each line ofthe program was executed

oprofile tools

▪A performancecounter is the part of a microprocessor that measures and gathersperformance-relevant events on the microprocessor.

▪The numberand type of available events differ significantly between existingmicroprocessors.

▪These counters imposeno overheadon the system

▪One performance area ofconcern is cache misses.The following section describes different types of coding areas that can causecache misses

▪Cache missesare costly, try to minimize by following suggestions:

▪Keep frequentlyaccessed data together.

▪Storeand access frequently used data in flat, sequential data structuresand avoid pointer indirection.

▪Access data sequentially.

▪If theprogram is accessing data sequentially, each cache miss brings in n words. Ifthe program is accessing only nth word, it brings in unneeded data, degradingperformance.

▪Avoid simultaneously traversing several large buffers of data

▪Therecan be cache conflicts between the buffers. Instead, pack the contentssequentially into one buffer whenever possible. If you are using vertex arrays,try to use interleaved arrays.

▪Some framebuffers have cache-likebehaviors as well.

▪It isa good idea to group geometry so that the drawing is done to one part of thescreen at a time.

Padding

▪Some compilers (or compiler options) automatically pad structures.

▪Referencinga data structure that spans two cache blocks may incur two misses, even if thestructure itself is smaller than the block size.

▪Paddingstructures to a multiple of the block size and aligning them on a blockboundary can eliminate these "misalignment" misses

Aligning

▪Alignment is a little more difficult, since the structure'saddress must be a multiple of the cache block size.

▪Aligningstatically declared structures generally requires compiler support. Some versions of malloc() return cache block aligned memory

▪Theprogrammer can align dynamically allocated structures using simple pointerarithmetic

Packing

▪Packing is the opposite of padding