[转]GDBINT gdb internal Notes …

GDBINT gdb internal Notes

GDB结构简介(overall structure)
1)GDB组成:
user interface
symbol handling  (the  symbol side)
object file readers, debugging info interpreters, symbol table management, source language expression 
parsing,type and value printing. 
target system handling  (the target side).
execution control, stack frame analysis, and physical target manipulation
target side 和symbol side的区分并不是非常清晰,只是对于理解GDB大有好处.

2)GDB Configurations 
Host/host surport/host dependent:host 是运行GDB的系统,为了让GDB在host上运行所需要的信息(#include/宏定义/...)
叫做host surpot.
Target/Target surport/target dependent: 就是目标机/目标进程的堆栈结构,指令集,寄存器...
Native/native surport/native dependent: host和target是一样的,这是候需要的支持叫做native dependent.比如unix下需要的
子进程支持/通过ptrace、procfs对进程调试的支持/如何在这种情况下获得target的寄存器内存信息等.

3)目录结构和文件命名原则
*read.c :读取obj/symbol table/
*-thread.c :处理debug thread的文件
inf*.c : 处理  inferior program的代码 (被调试程序的幽默叫法)
*-tdep.c : target dependent code
*-nat.c : native surport code


Algorithms
GDB采用的算法不是很复杂,关键在于很容易迷失在具体的细节/特殊情况(和OS面临的处境差不多).

Frame
GDB 为了支持DWARF标准的Call Frame而重新定义了GDB自己的Frame结构.GDB的Frame用于跟踪calling 和called 函数.就是一个调用
栈的back trace过程.GDB 的Frame不仅仅是个call frame,每级frame都包含当前cpu状态的snapshot(或可以取到)(fix me).

sentinel  frame:当前指令的frame,调用栈的顶端,level是-1,类型是SENTINEL_FRAME,而当前函数的frame(fix me)其level是0.
unwind操作:取自DWARF标准, frame_register_unwind,就是返回上一级frame.

Prologue Analysis
CFI : Dwarf call frame information,现在的GCC都生成这种call frame 信息了.
Prologure 分析用于找出frame的size和older frame的基地址.虽然有CFI的帮助会简单些,但是不是什么时候都有CFI的.并且
Prologue分析这个技术早于CFI. 这种back trace容许GDB修改一些参数或者一个某些变量的值. 应为有callee saved register存在,所
以,进程的frame pointer会发生改变.并且某些变量可能无规则的散布于yongest的frame中.(request comments)
Prologure技术的基本原理是分析具体的汇编码,藉此找出这些frame size和这些保存在stack frame中的寄存器值.prologue-value.h
prologue-value.c提供了一个prologue分析的框架,从函数的入口指令开始,分析到的当前PC, 然后:
1)检查sp值是否可知:知道了,意味着得到了frame size.
2)检查我们吧previous的frame指针存在哪里.
具体的细节请参考GDBINT 和相关代码(request comments).

Breakpoint HandlingHardware Breakpoint: 需要CPU的支持.执行到指定PC就break out(一个中断或者其他什么机制).
Software Breakpoint: GDB把指定地址的指令换成一条特殊指令(比如x86可以是int3,可以是div 0),等异常发生后GDB获取控制权,等到
user 发出继续的命令后,再把那条指令换回去.

软件break point的宏定义:BREAKPOINT
breakpoint的处理大多在 `breakpoint.c',`infrun.c'.
接口函数简介: 

 

target_remove_breakpointtarget_insert_breakpoint (bp_tgt)

target_remove_hw_breakpoint (bp_tgt)target_insert_hw_breakpoint (bp_tgt) Longjmp Support
GDB支持在在程序做longjmp的时候break在longjmp的目标地址. (参考 "maint info breakpoint").必须实现
gdbarch_get_longjmp_target.同时jmp_buf是系统特定的,应该在tm-target.h中定义jmp_buf.参考tm-sun4os4.h sparc-tdep.c

Watchpoints就是数据访问时的breakpoint.GDB总是试图使用hw支持的watchpoints. 但是并不是所有系统都有watch point支持,或者hw资源不够,或
者要监视的内存太大...
software的watchpoint是非常慢的:采用单步执行,每次检查目标地址.对于write watch,gdb就是比较watch地址的值.对于read watch
point,需要目标系统提供target_stopped_data_address:返回被调试程序停止时,所访问的地址.
下面是支持硬件Watchpoints的一些资源:
TARGET_HAS_HARDWARE_WATCHPOINTS
If defined, the target supports hardware watchpoints.
TARGET_CAN_USE_HARDWARE_WATCHPOINT (type, count, other)
Return the number of hardware watchpoints of type type that are possible to be set. 
The value is positive if count watchpoints of this type can be set, zero if setting watchpoints of this type 
is not supported, and negative if count is more than the maximum number of watchpoints of type type that can 
be set. other is non-zero if other types of watchpoints are currently enabled (there are architectures which
cannot set watchpoints of different types at the same time).

TARGET_REGION_OK_FOR_HW_WATCHPOINT (addr, len) Return non-zero if hardware watchpoints can be used to watch a region whose address is addr and whose length 
in bytes is len.

target_insert_watchpoint (addr, len, type) 
target_remove_watchpoint (addr, len, type) Insert or remove a hardware watchpoint starting at addr, for len bytes. type is the watchpoint type, one of the 
possible values of the enumerated data type target_hw_bp_type, defined by `breakpoint.h' as follows:  enum target_hw_bp_type
{
hw_write = 0,
hw_read = 1,
hw_access = 2,
hw_execute = 3
};

 

These two macros should return 0 for success, non-zero for failure.

target_stopped_data_address (addr_p) If the inferior has some watchpoint that triggered, place the address associated with the watchpoint at the location pointed to by addr_p and return non-zero. Otherwise, return zero. Note that this primitive is used by GDB only ontargets that support data-read or data-access type watchpoints, so targets that have support only for data-write watchpoints need not implement these primitives.

HAVE_STEPPABLE_WATCHPOINT If defined to a non-zero value, it is not necessary to disable a watchpoint to step over it.

int gdbarch_have_nonsteppable_watchpoint (gdbarch) If it returns a non-zero value, GDB should disable a watchpoint to step the inferior over it.

HAVE_CONTINUABLE_WATCHPOINT If defined to a non-zero value, it is possible to continue the inferior after a watchpoint has been hit.

CANNOT_STEP_HW_WATCHPOINTS If this is defined to a non-zero value, GDB will remove all watchpoints before stepping the inferior.

STOPPED_BY_WATCHPOINT (wait_status) Return non-zero if stopped by a watchpoint. wait_status is of the type struct target_waitstatus, defined by `target.h'. Normally, this macro is defined to invoke the function pointed to by the to_stopped_by_watchpoint member of the structure(of the type target_ops, defined on `target.h') that describes the target-specific operations; to_stopped_by_watchpoint ignores the wait_status argument.

GDB does not require the non-zero value returned by STOPPED_BY_WATCHPOINT to be 100% correct, so if a target cannot  determine for sure whether the inferior stopped due to a watchpoint, it could return non-zero "just in case".

x86 Watchpoints :请参考 GDBINT英文原版.Checkpoints
Checkpoints是一个程序运行状态的一个副本. 以后可以从这里重新开始执行. 实现方式有fork一个子进程, 保持core文件等.总之要保存
程序状态的一切:寄存器/内存/....
Observing changes in GDB internals  (眼拙,未能明白讲的是啥)



.....算了吧这个没有太大必要看了. 同时  libgdb 也不看了:没啥详细说明,是个GDB的标准库,用于构建图形化的user界面等.

 

Symbol Handling

这是个关键模块. Symbol包括函数,变量和类型.

Symbol Reading
symfile.c 含有打开synbol file的代码.(参考GDB命令symbol-file命令,一般就在要调试的程序中).GDB也使用BFD来读取符号表:参考
find_sym_fns.
Symbol-reading modules 通过add_symtab_fns向GDB注册自己,其参数是struct sym_fns:symbol format的名称, prefix的长度, 四个函数指针. 
每个symbol reading模块提供下面四个接口函数:(细节参考GDBINT或者代码)(现在还不是很清楚,request comments).

 


xyz _symfile_init(struct sym_fns *sf)

当需要读取符号表的时候,symbol_file_add 会调用此函数,参数是新分配的一个fym_fns,其bfd field 是新符号表对应的BFD.

xyz_new_init()

放弃当前的symbols时,symbol_file_add 调用此函数.

 

xyz_symfile_read(struct sym_fns *sf, CORE_ADDR addr, int mainline)

symbol_file_add 调用此函数获取具体的符号表:psymtabs or symtabs. sf 是调用初始化函数时的那个sym_fns.

xyz_psymtab_to_symtab  (struct partial_symtab *pst)

Partial Symbol TablesGDB 有三种符号表:

 

  • Full symbol tables (symtabs) :包含关于符号和地址的主信息.

  • Partial symbol tables (psymtabs):包含足够的信息去读取full symbol table.

  • Minimal symbol tables (msymtabs):非调试用symbols.

psymtab 的作用是快速传递一个程序的符号表信息:external symbols,types, static symbols and types, enum values declared at file scope.psymtab还包含一些地址范围.
psymtab的使用方式如下:
1)通过一个指令地址,可以找到psymtable的一个地址范围,从而可以读取完整的符号表.比如find_pc_function, find_pc_line, and other find_pc_...
2)通过名字来使用psymtab: lookup_symbol, 通过名字找到对应的完整符号.
psymtab不含有符号的类型信息. 细节请参考GDBINT.

Types
Fundamental Types  (e.g.,  FT_VOID FT_BOOLEAN).

GDB使用的内部类型.

Type Codes (e.g., TYPE_CODE_PTRTYPE_CODE_ARRAY).

属于基本类型或者派生类型. 典型情况下几个基本类型 FT_* 映射到一种TYPE_CODE_* , 通过其bit长度,是否是signed的等熟悉进行区分.

Builtin Types  (e.g.,  builtin_type_void builtin_type_char).

历史原因造成的,对应于基本类型.(GDB的维护人员其实打算把这些internal type给搞掉的: builtin_type_int (gdbtypes.c)基本上和 a TYPE_CODE_INT (c-lang.c)是一样的.(对应于FT_INTEGER).区别在于builtin_type 不和任何objfile有关联,而`c-lang.c' 则搞了很多 TYPE_CODE_INT, 每个都和特定的objfile相关.



Object File Formats
a.out                    : unix的原始的obj文件类型. 符号表几乎没有,对应文件是dbxread.c.

COFF format :  System V Release 3 (SVR3) Unix, 符号表有缺陷(比如include的头文件不能解析),对应文件coffread.c

ECOFF         COFF 扩展版本,Mips and Alpha workstations, mipsread.c

XCOFF         IBM RS/6000 running AIX

PE              : Windows 95 and NT use, 基本上是COFF.

ELF             : System V Release 4 (SVR4) Unix. ELF 类似COFF但是解决了COFF的许多不足, elfread.c

SOM            : HP(not to be confused with IBM's SOM, which is a cross-language ABI),`somread.c'.

 

Debugging File Formats

独立于obj文件的调试信息.

Stabs

stabs原本是a.out中的信息,但是COFF,ELF和其他obj文件也含有有这个信息.dbxread.c:基本的stabs处理和封装,stabsread.c:干活的地方.

COFF :coff文件也含有私有的debugging信息,不太常用,扩展性不好.

Mips debug (Third Eye): ECOFF 含有的特殊调试信息, mdebugread.c

 

DWARF 2 : DWARF 1的下一版,但和第一版不兼容,dwarf2read.c

SOM: 和COFF类似.


Adding a New Symbol Reader to GDB

如果使用现有的obj文件,就简单的多.否则,你需要先将新的 obj文件支持加到BFD. GDB 通过一组swaping 函数(request comment),使用具体的BFD接口, 对于特殊的target(如COFF),可能还需要一层封装,因为不同的platform可能不一样,这些接口应该在bfd/libxyz.h中进行描述.

 

Memory Management for Symbol Files

 

 

一个symbol file的符号信息,存储在objfile_obstack里(request comment), unload 一个objfile的时候内存自动释放. 所以也不要在一个obj文件中引用另一个obj的符号. 和用户相关的一些数(request comment)据和type也是存在于这个obstack里的,但是objfile unload的时候会copy到global的内存里,所以不会丢失.


 


 

Language Support

这个东西我们不想涉及,幸好,GDBINT里说的也很少,仅仅罗列了一下步骤...
1. Create the expression parser :lang-exp.y ,一般是通过YACC parser产生所需要的parser
2.  Add any evaluation routines 
3.
Add any evaluation routines, if necessary
4.Update some existing code
5.Add a place of call
6.Use macros to trim code
7. Edit `Makefile.in'
(这里仅仅罗列下步骤,具体请参考GDBINT, 不大关心这个.... 同时也不懂)



Host Definition
Add new Host
现在应该用autoconf来做这件事情(reques comment).老的host使用下面的配置文件.

 

gdb/config/arch/xyz.mh' 包含host和native的配置.host configuration 现在由Autoconf处理,HOST信息包含一些定义:XM_FILE=xm-xyz.h,还可能有CC, SYSV_DEFINE, XM_CFLAGS, XM_ADD_FILES, XM_CLIBS, XM_CDEPS, 请参考"Makefile.in".

 

 

gdb/config/arch/xm-xyz.h' 这个文件以前包含在xyz机器上运行gdb需要的一些定义和信息,现在通过Autoconf来实现.新的host和native配置不需要这个文件了. Host Conditionals

完成GDB的配置后,需要很多的宏需要定义.这里列出了一些:

 

GDBINIT_FILENAME  : GDB初始化文件名,一般是.gdbinit

NO_STD_REGS :   This macro is deprecated.

SIGWINCH_HANDLER    If your host defines SIGWINCH, you can define this to be the name of a function to be called if SIGWINCH is received.

SIGWINCH_HANDLER_BODY  Define this to expand into code that will define the function named by the expansion of SIGWINCH_HANDLER.

ALIGN_STACK_ON_STARTUP  Define this if your system is of a sort that will crash in tgetent if the stack happens not to be longword-aligned when main is called. This is a rare situation, but is known to occur on several different types of systems.

CRLF_SOURCE_FILES  Define this if host files use \r\n rather than \n as a line terminator. This will cause source file listings to omit \r characters when printing and it will allow \r\n line endings of files which are "sourced" by gdb. It must be possible to open files in binary mode using O_BINARY or, for fopen, "rb".

DEFAULT_PROMPT  The default value of the prompt string (normally "(gdb) ").

DEV_TTY  The name of the generic TTY device, defaults to "/dev/tty".

FOPEN_RB  Define this if binary files are opened the same way as text files.

HAVE_MMAP  In some cases, use the system call mmap for reading symbol tables. For some machines this allows for sharing and quick updates.

HAVE_TERMIO  Define this if the host system has termio.h.

INT_MAX  INT_MIN LONG_MAX UINT_MAX ULONG_MAX  Values for host-side constants.

ISATTY  Substitute for isatty, if not available.

LONGEST  This is the longest integer type available on the host. If not defined, it will default to long long or long, depending on CC_HAS_LONG_LONG.

CC_HAS_LONG_LONG  Define this if the host C compiler supports long long. This is set by the configure script.

PRINTF_HAS_LONG_LONG  Define this if the host can handle printing of long long integers via the printf format conversion specifier ll. This is set by the configure script.

HAVE_LONG_DOUBLE  Define this if the host C compiler supports long double. This is set by the configure script.

PRINTF_HAS_LONG_DOUBLE  Define this if the host can handle printing of long double float-point numbers via the printf format conversion specifier Lg. This is set by the configure script.

SCANF_HAS_LONG_DOUBLE  Define this if the host can handle the parsing of long double float-point numbers via the scanf format conversion specifier Lg. This is set by the configure script.

LSEEK_NOT_LINEAR  Define this if lseek (n) does not necessarily move to byte number n in the file. This is only used when reading source files. It is normally faster to define CRLF_SOURCE_FILES when possible.

L_SET  This macro is used as the argument to lseek (or, most commonly, bfd_seek). FIXME, should be replaced by SEEK_SET instead, which is the POSIX equivalent.

NORETURN  If defined, this should be one or more tokens, such as volatile, that can be used in both the declaration and definition of functions to indicate that they never return. The default is already set correctly if compiling with GCC. This will almost never need to be defined.

ATTR_NORETURN  If defined, this should be one or more tokens, such as __attribute__ ((noreturn)), that can be used in the declarations of functions to indicate that they never return. The default is already set correctly if compiling with GCC. This will almost never need to be defined.

SEEK_CUR  SEEK_SET  Define these to appropriate value for the system lseek, if not already defined.

STOP_SIGNAL  This is the signal for stopping GDB. Defaults to SIGTSTP. (Only redefined for the Convex.)

 

USG  Means that System V (prior to SVR4) include files are in use. (FIXME: This symbol is abused in `infrun.c', `regex.c', and `utils.c' for other things, at the moment.)

 

lint  Define this to help placate lint in some situations.

 

volatile  Define this to override the defaults of __volatile__ or .


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值