Asterisk Debug

Call Log

Asterisk by default writes to the system log a complete call record, for example:
> cd /var/log/asterisk/cdr-csv
> tail -f Master.csv
"","5032273698","9714042975","sip","""DORGAN M"" <5032273698>","SIP/147.135.0.129-08100358","","AGI","
/usr/local/mipl/agnese|http://www.nextbus.com/nextbus3.mipl","2004-12-09 13:14:41"
,"2004-12-09 13:14:44","2004-12-09 13:14:53",12,9,"ANSWERED","DOCUMENTATION"

Message Log

Or, if you are having problems catching intermittent problems on your system, consider adding more information to the Asterisk message log. If you look in logger.conf you will see something like:
messages => notice,warning,error
consider changing this to:
messages => notice,warning,error,debug,verbose
(for short periods of time anyway — it can really eat disk space). Note you will need to restart Asterisk or type LOGGER ROTATE at the CLI to get this change to take effect (reload doesn't do it)

Backtracing a core dump file in /tmp

  1. start Asterisk with safe_asterisk
  2. enter "gdb asterisk core.xxxx"
  3. enter "bt" while in gdb (or do a "bt full")
  4. enter "thread apply all bt"
Naturally you'll need to have gdb installed on your system

CONSOLE=no

Are you running safe_asterisk? If so try to modify safe_asterisk ... CONSOLE=yes to CONSOLE=no.


Debugging a running asterisk

List all the asterisk threads with  ps axum -C asterisk  to find the thread that takes the most CPU. Now connect with gdb:

gdb /usr/sbin/asterisk pid

and do "bt" and post the last few lines to the mailing list ...



ulimit

If asterisk is crashing as in exiting, issue the command

ulimit -c unlimited

and this should allow asterisk to drop a core file if it can.




HowTo Debug a DeadLock in Asterisk


1)  In the asterisk makefile you need to uncomment

  1. Optional debugging parameters
DEBUG_THREADS = #-DDEBUG_THREADS #-DDO_CRASH

  • NB* the DO_ CRASH arg will force a core dump on certain conditions that indicate a possible deadlock
that otherwise will just generate a verbose warning message, you prolly dont wanta do this on production box but
for testing this is a usefull option because the core will show off what was happending between the threads.
grep the sources for the CRASH macro do see under what conditons this might occur

2)  Turn on Verbose logging
Current cvs as of Feb 1 2004 allows verbose msg to be logged see logger.conf add VERBOSE to the messages file.

This will allow you to log ast_verbose msg's to the logs so we can
  • see what the bt threads are doing in time sequence order
  • re-create the situation that lead to core or deadlock

3)  When you deadlock don't restart the box or restart asterisk
Instead take the 5 mins while everyone is freaking out to attach gdb to the running asterisk process and do
gdb /usr/sbin/asterisk <pid of main * process>
...you can get the asterisk "Process Identification Number (PID) by asterisk -r ("-> currently running on blah (pid =9075)"). Note: If the box is truly hosed & blocked on all I/O this will fail also, you must use  ps ax ) or look for lowest pid after doing  ps ax -C asterisk .

4)  after gdb loads do
info thread
thread apply all bt
At the very least you are now going to save that bt output to a file and post that to bugs.digium.com

5) Identify dead locked threads by this pattern
Note the "_pthread_wait_for_restart_signal". That means we are in wait loop wanting the mutex lock

Thread 23 (Thread 3576854 (LWP 2910)):
  1. 0 0x400c787e in sigsuspend () from /lib/libc.so.6
  2. 1 0x40022879 in __pthread_wait_for_restart_signal () from /lib/libpthread.so.0
  3. 2 0x40024a36 in __pthread_alt_lock () from /lib/libpthread.so.0
  4. 3 0x40020fd2 in pthread_mutex_lock () from /lib/libpthread.so.0

Note apparently not all systems implement the "pthread_wait_for_restart_signal", so I guess you might just want to scan for at least "pthread_mutex_lock". You will usually find more than one of these patterns because once a thread is dead locked on a mutex lock, other threads that want the same lock will pile up quickly.

6)  Try to identify the first thread, that is dead locked.
The sequence number of bt threads is not relevent, because threads are re-used.

Look in your log files at the time stamps and try to corrolate the THREAD number (e.g. "Thread 23 (Thread 3576854 (LWP 2910))") to the earliest entry in the log file with that same THREAD number (e.g. "VERBOSE[3576854]"). Note the FRAME number just before doing "pthread_mutex_lock()", (that is the #0, #1, #2, number right after bt THREAD number).

Log files are usually in  /var/log/asterisk/ . Check the files  messages  and  debug  in this directory.

7)  Find the position
Now that we have our potential guilty party as the first in line for the lock do
thread <sequence number> for the THREAD of interest
frame <fame number> for the frame # before the pthread_mutex_lock ()
This now should be in our asterisk sources right where we call ast_mutex_lock(). Record the name of the lock it was trying to get eg ast_mutex_lock(&agentlock).

8)  Check who has the lock
Now if we have properly turned on thread debugging we are going to be able to see into  include/asterisk/lock.f  ast_mutex_t struct which looks like this

pthread_mutex_t mutex;
char *file;
int lineno;
char *func;
pthread_t thread;

so now that we have our lock we can see who has it & what we are waiting on, .. do the following bt cmds

p somelockIjustFound->thread
p somelockIjustFound->file
p somelockIjustFound->func
p somelockIjustFound->lineno

This is the guility code that is holding the lock that we want to look at.

9)  Now comes the hard part ...:)
Why is this Code in that thread, file, function, lineno not releasing our lock? We have to now scour the code checking all places where that lock is set & released looking for
  • places where there is a lock hierarachy and locks are set and released in different order,
(there is a same rule for locking rows in a sql db: you have to lock & release in the same order everwhere)
  • places where a lock is held too long in for and while loops, before calling longer running functions etc
(there is a same rule for sql db transactions: get in and out quickly)
  • Do not mutex lock at critical section where we might receive operating system signals




评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值