本文假设你已经了解了如何配置windbg,并且从windbg的基础命令开始,帮助你熟悉使用这个强大的工具。
基本配置
- 从framework目录中拷贝sos.dll文件到你的windbg安装目录。注意你拷贝的framework版本必须和你要调试的目标程序所使用的版本一致,否则调试信息就不能正确显示出来。如果你同时工作在两个版本的framework的话,你可以将sos文件重命名为sos<version>.dll或者直接将它们放入不同的文件夹下面。
- 创建一个文件夹来存放下载来的symbol文件。推荐使用"C:/symbols"。
- 打开windbg,点击File->Symbol File Path打开symbol路径设置对话框。
- 输入你创建好的文件夹路径以及symbol文件所在的服务器路径。语法格式如下:
srv*[cache path]*[symbols path]
推荐的快速路径设置为:
srv*c:/symbols/public*http://msdl.microsoft.com/download/symbols
下面我们正式开始介绍重点。不过你得先准备一个保存好的dump文件,或者attach到一个进程中。
有用的命令
我准备了一个从IIS6-server 中保存下来的dump文件来示范一些有用的命令。
首先我们来加载SOS。我们可以用.load命令。这个命令的语法很简单:
.load [extension filename]
如果你要加载的sos文件没有改过名字,那么事情就很简单
.load sos
接下来你使用所有的SOS扩展命令了。扩展而来的命令往往会在前面加一个感叹号"!"。所以你可以用!help命令先来看看SOS扩展命令的介绍。
如果你的两组扩展刚好有相同的命令的话,你可以用扩展名来区分它们,举个例子:
!sos.help
现在我们来试试!help命令,它应该输出以下内容:
1: 0:000> !help
2: -------------------------------------------------------------------------------
3: SOS is a debugger extension DLL designed to aid in the debugging of managed
4: programs. Functions are listed by category, then roughly in order of
5: importance. Shortcut names for popular functions are listed in parenthesis.
6: Type "!help " for detailed info on that function.
7:
8: Object Inspection Examining code and stacks
9: ----------------------------- -----------------------------
10: DumpObj (do) Threads
11: DumpArray (da) CLRStack
12: DumpStackObjects (dso) IP2MD
13: DumpHeap U
14: DumpVC DumpStack
15: GCRoot EEStack
16: ObjSize GCInfo
17: FinalizeQueue EHInfo
18: PrintException (pe) COMState
19: TraverseHeap BPMD
20:
21: Examining CLR data structures Diagnostic Utilities
22: ----------------------------- -----------------------------
23: DumpDomain VerifyHeap
24: EEHeap DumpLog
25: Name2EE FindAppDomain
26: SyncBlk SaveModule
27: DumpMT GCHandles
28: DumpClass GCHandleLeaks
29: DumpMD VMMap
30: Token2EE VMStat
31: EEVersion ProcInfo
32: DumpModule StopOnException (soe)
33: ThreadPool MinidumpMode
34: DumpAssembly
35: DumpMethodSig Other
36: DumpRuntimeTypes -----------------------------
37: DumpSig FAQ
38: RCWCleanupList
39: DumpIL
要查看某个具体命令的帮助文档,请输入:!help [命令名称]
.time
这个并不是SOS命令,所以它不以"!"打头。运行.time命令可以输出关于时间的估计值。包括系统运行时,进程运行时和花在内核/用户态的时间。
1: 0:000> .time
2: Debug session time: Tue Oct 23 08:38:35.000 2007 (GMT+1)
3: System Uptime: 4 days 17:48:01.906
4: Process Uptime: 0 days 0:24:37.000
5: Kernel time: 0 days 0:04:23.000
6: User time: 0 days 0:03:28.000
正如信息显示的,这个系统已经运行了超过了4天。进程总共运行了24.5分钟,总计占用了8分钟的CPU时间。所以我们可以得到进程运行时有32.5%的时间在进行CPU运算。
!threadpool
!Threadpool命令可以准确的提供程序运行某一时刻的cpu使用率信息,待执行事务请求的个数,完成端口(Completion Port)线程和计时器个数。
1: 0:000> !threadpool
2: CPU utilization 100%
3: Worker Thread: Total: 5 Running: 4 Idle: 1 MaxLimit: 200 MinLimit: 2
4: Work Request in Queue: 16
5: Unknown Function: 6a2d945d Context: 023ede30
6: Unknown Function: 6a2d945d Context: 023ee1e8
7: AsyncTimerCallbackCompletion TimerInfo@11b53760
8: Unknown Function: 6a2d945d Context: 023ee3a8
9: Unknown Function: 6a2d945d Context: 023e3040
10: Unknown Function: 6a2d945d Context: 023ee178
11: Unknown Function: 6a2d945d Context: 023edfb0
12: AsyncTimerCallbackCompletion TimerInfo@11b36428
13: AsyncTimerCallbackCompletion TimerInfo@11b53868
14: Unknown Function: 6a2d945d Context: 023ee060
15: Unknown Function: 6a2d945d Context: 023ee290
16: Unknown Function: 6a2d945d Context: 023eded0
17: Unknown Function: 6a2d945d Context: 023edd88
18: Unknown Function: 6a2d945d Context: 023ede98
19: Unknown Function: 6a2d945d Context: 023ee258
20: Unknown Function: 6a2d945d Context: 023edfe8
21: --------------------------------------
22: Number of Timers: 9
23: --------------------------------------
24: Completion Port Thread:Total: 3 Free: 3 MaxFree: 4 CurrentLimit: 2 MaxLimit: 200 MinLimit: 2
这个例子中的进程CPU使用率为100%,下面介绍的命令将进一步分析CPU的分配情况。
!Runaway
这是个很有用的命令,它会列举每个线程的CPU使用率。我们经常会用它来分析程序不正常的高CPU占用率问题。
1: 0:000> !runaway
2: User Mode Time
3: Thread Time
4: 25:1a94 0 days 0:00:39.937
5: 16:1bc0 0 days 0:00:38.390
6: 50:1e8c 0 days 0:00:08.859
7: 52:1e40 0 days 0:00:08.687
8: 20:1c2c 0 days 0:00:08.234
9: 51:1340 0 days 0:00:08.171
10: 21:1bcc 0 days 0:00:06.953
11: 26:13ec 0 days 0:00:06.671
12: 44:131c 0 days 0:00:03.906
13: 22:d8c 0 days 0:00:03.375
14: 33:78c 0 days 0:00:02.656
15: 34:1a8c 0 days 0:00:00.906
16: 29:1f5c 0 days 0:00:00.828
17: 6:e28 0 days 0:00:00.625
18: 5:1c78 0 days 0:00:00.546
19: 23:14a4 0 days 0:00:00.484
20: 4:5ac 0 days 0:00:00.437
21: 45:5dc 0 days 0:00:00.421
22: 3:13b4 0 days 0:00:00.421
23: 47:19c8 0 days 0:00:00.375
24: 28:1b6c 0 days 0:00:00.250
25: 46:1dac 0 days 0:00:00.156
26: 7:1dd8 0 days 0:00:00.109
27: 48:cdc 0 days 0:00:00.093
28: 49:1eac 0 days 0:00:00.062
29: 15:1a64 0 days 0:00:00.062
30: 0:1804 0 days 0:00:00.046
31: 36:4a4 0 days 0:00:00.031
32: 11:1eb4 0 days 0:00:00.031
33: 1:10b4 0 days 0:00:00.031
34: 31:16ac 0 days 0:00:00.015
35: 14:4ac 0 days 0:00:00.015
36: 2:186c 0 days 0:00:00.015
37: 59:590 0 days 0:00:00.000
38: 58:294 0 days 0:00:00.000
39: 57:16d0 0 days 0:00:00.000
40: 56:1578 0 days 0:00:00.000
41: 55:1428 0 days 0:00:00.000
42: 54:16d8 0 days 0:00:00.000
43: 53:fd8 0 days 0:00:00.000
44: 43:1b8c 0 days 0:00:00.000
45: 42:1c24 0 days 0:00:00.000
46: 41:1e2c 0 days 0:00:00.000
47: 40:11b0 0 days 0:00:00.000
48: 39:edc 0 days 0:00:00.000
49: 38:1a08 0 days 0:00:00.000
50: 37:171c 0 days 0:00:00.000
51: 35:1254 0 days 0:00:00.000
52: 32:1f9c 0 days 0:00:00.000
53: 30:1ae8 0 days 0:00:00.000
54: 27:190c 0 days 0:00:00.000
55: 24:1d2c 0 days 0:00:00.000
56: 19:1e38 0 days 0:00:00.000
57: 18:ee4 0 days 0:00:00.000
58: 17:fb8 0 days 0:00:00.000
59: 13:1b54 0 days 0:00:00.000
60: 12:1a48 0 days 0:00:00.000
61: 10:f64 0 days 0:00:00.000
62: 9:1024 0 days 0:00:00.000
63: 8:1b78 0 days 0:00:00.000
仔细的人可能会发现所有列举的总时间事实上并不等于.time命令中显示的cpu 使用率。原因很简单,线程常常会被回收和重用。这就意味着一条线程占用的CPU总时间可能因页面请求而分割成好几部分。
!Threads
我们可以使用!Threads命令来获取线程的更多信息。它可以列举所程序中有的托管线程,当前线程所在的程序域(AppDomain)等信息。以下是这个命令的输出信息:
1: 0:000> !threads
2: ThreadCount: 48
3: UnstartedThread: 0
4: BackgroundThread: 29
5: PendingThread: 0
6: DeadThread: 19
7: Hosted Runtime: no
8: PreEmptive GC Alloc Lock
9: ID OSID ThreadOBJ State GC Context Domain Count APT Exception
10: 16 1 1bc0 001fccd0 1808220 Enabled 00000000:00000000 0019daf0 0 Ukn (Threadpool Worker)
11: 22 2 d8c 002016f0 b220 Enabled 00000000:00000000 0019daf0 0 MTA (Finalizer)
12: 14 4 4ac 00242e58 880a220 Enabled 00000000:00000000 0019daf0 0 MTA (Threadpool Completion Port)
13: 23 5 14a4 11b39f18 80a220 Enabled 00000000:00000000 0019daf0 0 MTA (Threadpool Completion Port)
14: 24 6 1d2c 11b41ad8 1220 Enabled 00000000:00000000 0019daf0 0 Ukn
15: 25 7 1a94 11b46c70 180b220 Enabled 27240c98:27241fd8 11b42540 1 MTA (Threadpool Worker)
16: 26 9 13ec 12ce2888 200b220 Enabled 2a9f1434:2a9f33c0 11b42540 0 MTA
17: 27 a 190c 12d85eb8 200b220 Enabled 00000000:00000000 11b42540 0 MTA
18: 29 b 1f5c 13df6a50 200b220 Enabled 2ab1da6c:2ab1f1c0 11b42540 0 MTA
19: 30 c 1ae8 12d44a58 b220 Enabled 00000000:00000000 11b42540 0 MTA
20: 31 d 16ac 12e2e008 200b220 Enabled 2a81348c:2a8153c0 11b42540 1 MTA
21: 5 e 1c78 12da2160 220 Enabled 00000000:00000000 0019daf0 0 Ukn
22: 33 8 78c 11b674c8 200b220 Enabled 2707b818:2707c1d8 11b42540 0 MTA
23: 34 12 1a8c 13f163c8 220 Enabled 00000000:00000000 0019daf0 0 Ukn
24: 36 13 4a4 13eef718 200b220 Enabled 2a7db4a4:2a7dd3c0 11b42540 0 MTA
25: 4 14 5ac 13ef2008 220 Enabled 00000000:00000000 0019daf0 0 Ukn
26: 42 10 1c24 13f0e950 880b220 Enabled 00000000:00000000 0019daf0 0 MTA (Threadpool Completion Port)
27: 6 11 e28 13f16008 220 Enabled 00000000:00000000 0019daf0 0 Ukn
28: 3 f 13b4 13eba008 220 Enabled 00000000:00000000 0019daf0 0 Ukn
29: 43 15 1b8c 140db008 880b220 Enabled 00000000:00000000 0019daf0 0 MTA (Threadpool Completion Port)
30: 44 17 131c 140ceb28 200b220 Enabled 272288c8:27229fd8 11b42540 0 MTA
31: 45 1d 5dc 140cd0a0 220 Enabled 00000000:00000000 0019daf0 0 Ukn
32: 47 20 19c8 1651a008 220 Enabled 00000000:00000000 0019daf0 0 Ukn
33: XXXX 24 0 16468880 1801820 Enabled 00000000:00000000 0019daf0 0 Ukn (Threadpool Worker)
34: 46 1f 1dac 1650ab48 220 Enabled 00000000:00000000 0019daf0 0 Ukn
35: XXXX 1a 0 140d5008 1801820 Enabled 00000000:00000000 0019daf0 0 Ukn (Threadpool Worker)
36: XXXX 16 0 140c5008 1801820 Enabled 00000000:00000000 0019daf0 0 Ukn (Threadpool Worker)
37: 50 3 1e8c 14064420 180b220 Enabled 27246f54:27247fd8 11b42540 1 MTA (Threadpool Worker)
38: XXXX 35 0 1406e800 1801820 Enabled 00000000:00000000 0019daf0 0 Ukn (Threadpool Worker)
39: 51 36 1340 140df008 180b220 Enabled 2adec9cc:2aded1c0 11b42540 1 MTA (Threadpool Worker)
40: XXXX 37 0 16566868 1801820 Enabled 00000000:00000000 0019daf0 0 Ukn (Threadpool Worker)
41: 48 38 cdc 16578840 220 Enabled 00000000:00000000 0019daf0 0 Ukn
42: XXXX 39 0 16566c28 1801820 Enabled 00000000:00000000 0019daf0 0 Ukn (Threadpool Worker)
43: XXXX 3b 0 1646b8b0 1801820 Enabled 00000000:00000000 0019daf0 0 Ukn (Threadpool Worker)
44: XXXX 3c 0 16674008 1801820 Enabled 00000000:00000000 0019daf0 0 Ukn (Threadpool Worker)
45: XXXX 3d 0 16676418 1801820 Enabled 00000000:00000000 0019daf0 0 Ukn (Threadpool Worker)
46: XXXX 3e 0 16676fb8 1801820 Enabled 00000000:00000000 0019daf0 0 Ukn (Threadpool Worker)
47: XXXX 3f 0 16674d48 1801820 Enabled 00000000:00000000 0019daf0 0 Ukn (Threadpool Worker)
48: XXXX 40 0 1667de10 1801820 Enabled 00000000:00000000 0019daf0 0 Ukn (Threadpool Worker)
49: XXXX 41 0 16680050 1801820 Enabled 00000000:00000000 0019daf0 0 Ukn (Threadpool Worker)
50: XXXX 42 0 166812e8 1801820 Enabled 00000000:00000000 0019daf0 0 Ukn (Threadpool Worker)
51: XXXX 43 0 16683e60 1801820 Enabled 00000000:00000000 0019daf0 0 Ukn (Threadpool Worker)
52: 52 44 1e40 165259e8 180b220 Enabled 2adf126c:2adf31c0 11b42540 1 MTA (Threadpool Worker)
53: XXXX 45 0 165b7c08 1801820 Enabled 00000000:00000000 0019daf0 0 Ukn (Threadpool Worker)
54: XXXX 46 0 165aa3d8 1801820 Enabled 00000000:00000000 0019daf0 0 Ukn (Threadpool Worker)
55: XXXX 47 0 165242c8 1801820 Enabled 00000000:00000000 0019daf0 0 Ukn (Threadpool Worker)
56: XXXX 48 0 165e9500 1801820 Enabled 00000000:00000000 0019daf0 0 Ukn (Threadpool Worker)
57: 49 3a 1eac 165676f0 220 Enabled 00000000:00000000 0019daf0 0 Ukn
注意这里ID被标注为XXXX的线程是正在等待被回收的线程。这里我们可以找到终结器(finalizer)的线程ID是22。如果这个线程中不健康活跃任务数量很大的话,很明显这个程序中的终结器出现了严重的问题。
切换线程
可以使用"~"命令来切换到某一线程。比如可以使用以下命令来切换到第50条线程:
1: 0:000> ~50s
!ClrStack
这是个很常用的命令,它可以列举当前线程的调用堆栈。如果你想查看更多的信息,你还可以在后面添加"-p"参数,这样你还可以看到堆栈中函数的参数值。
下面我们来查看下第50条线程中的调用堆栈:
1: 0:050> !clrstack
2: OS Thread Id: 0x1e8c (50)
3: ESP EIP
4: 17a9e750 7d61c828 [NDirectMethodFrameSlim: 17a9e750] System.DirectoryServices.Protocols.Wldap32.ldap_bind_s(IntPtr, System.String, System.DirectoryServices.Protocols.SEC_WINNT_AUTH_IDENTITY_EX, System.DirectoryServices.Protocols.BindMethod)
5: 17a9e768 14df70f9 System.DirectoryServices.Protocols.LdapConnection.BindHelper(System.Net.NetworkCredential, Boolean)
6: 17a9e794 14df6de0 System.DirectoryServices.Protocols.LdapConnection.Bind()
7: 17a9e79c 14df59e9 System.DirectoryServices.Protocols.LdapConnection.SendRequestHelper(System.DirectoryServices.Protocols.DirectoryRequest, Int32 ByRef)
8: 17a9e8b8 14df56e8 System.DirectoryServices.Protocols.LdapConnection.SendRequest(System.DirectoryServices.Protocols.DirectoryRequest, System.TimeSpan)
9: 17a9e8bc 14df5657 [InlinedCallFrame: 17a9e8bc]
我们可以从下而上的查看LdapConnection实例的调用过程SendRequest –> SendRequestHelper –> Bind…
我们再来看看带"-p"选项的结果:
1: 0:050> !clrstack -p
2: OS Thread Id: 0x1e8c (50)
3: ESP EIP
4: 17a9e750 7d61c828 [NDirectMethodFrameSlim: 17a9e750] System.DirectoryServices.Protocols.Wldap32.ldap_bind_s(IntPtr, System.String, System.DirectoryServices.Protocols.SEC_WINNT_AUTH_IDENTITY_EX, System.DirectoryServices.Protocols.BindMethod)
5: 17a9e768 14df70f9 System.DirectoryServices.Protocols.LdapConnection.BindHelper(System.Net.NetworkCredential, Boolean)
6: PARAMETERS:
7: this = 0x271fdfe0
8: newCredential =
9: needSetCredential =
10:
11: 17a9e794 14df6de0 System.DirectoryServices.Protocols.LdapConnection.Bind()
12: PARAMETERS:
13: this =
14:
15: 17a9e79c 14df59e9 System.DirectoryServices.Protocols.LdapConnection.SendRequestHelper(System.DirectoryServices.Protocols.DirectoryRequest, Int32 ByRef)
16: PARAMETERS:
17: this = 0x271fdfe0
18: request = 0x27246e38
19: messageID = 0x17a9e8ec
20:
21: 17a9e8b8 14df56e8 System.DirectoryServices.Protocols.LdapConnection.SendRequest(System.DirectoryServices.Protocols.DirectoryRequest, System.TimeSpan)
22: PARAMETERS:
23: this = 0x271fdfe0
24: request = 0x27246e38
25: requestTimeout =
26:
27: 17a9e8bc 14df5657 [InlinedCallFrame: 17a9e8bc]
现在我们可以有机会查看各个函数的参数,因为至少我们得到了参数对象的地址。下一个要出场的命令正好擅长这个活儿。
!DumpObject(!do)
这又是一个十分重要的命令。它会根据地址信息转储出整个对象。所以这个命令必须以一个地址为参数:
1: 0:050> !do 0x27246e38
2: Name: System.DirectoryServices.Protocols.SearchRequest
3: MethodTable: 14b394c4
4: EEClass: 14d97ce0
5: Size: 52(0x34) bytes
6: GC Generation: 0
7: (C:/WINDOWS/assembly/GAC_MSIL/System.DirectoryServices.Protocols/2.0.0.0__b03f5f7f11d50a3a/System.DirectoryServices.Protocols.dll)
8: Fields:
9: MT Field Offset Type VT Attr Value Name
10: 02c39310 4000102 4 System.String 0 instance 00000000 directoryRequestID
11: 14b398bc 4000103 8 ...ControlCollection 0 instance 27246e90 directoryControlCollection
12: 02c39310 4000111 c System.String 0 instance 27246d00 dn
13: 12579f5c 4000112 10 ....StringCollection 0 instance 27246eb4 directoryAttributes
14: 02c36ca0 4000113 14 System.Object 0 instance 27246ddc directoryFilter
15: 14b39344 4000114 18 System.Int32 1 instance 1 directoryScope
16: 14b393fc 4000115 1c System.Int32 1 instance 0 directoryRefAlias
17: 0fd3da00 4000116 20 System.Int32 1 instance 0 directorySizeLimit
18: 1202af88 4000117 28 System.TimeSpan 1 instance 27246e60 directoryTimeLimit
19: 120261c8 4000118 24 System.Boolean 1 instance 0 directoryTypesOnly
Okay, 厉害吧。这个地址所指向的对象正是System.DirectoryServices.Protocols.SearchRequest类型的。这还没完,你甚至还可以根据"Value"这一列的值继续追踪SearchRequest中成员字段的值。比如查看名为dn,类型为String的成员字段的对象信息:
1: 0:050> !do 27246d00
2: Name: System.String
3: MethodTable: 02c39310
4: EEClass: 0fb610ac
5: Size: 112(0x70) bytes
6: GC Generation: 0
7: (C:/WINDOWS/assembly/GAC_32/mscorlib/2.0.0.0__b77a5c561934e089/mscorlib.dll)
8: String: CN=Dummy,CN=Accounts,CN=useradm,DC=dummy,DC=net
9: Fields:
10: MT Field Offset Type VT Attr Value Name
11: 0fd3da00 4000096 4 System.Int32 1 instance 48 m_arrayLength
12: 0fd3da00 4000097 8 System.Int32 1 instance 47 m_stringLength
13: 0fb80010 4000098 c System.Char 1 instance 43 m_firstChar
14: 02c39310 4000099 10 System.String 0 shared static Empty
15: >> Domain:Value 0019daf0:03380310 11b42540:03380310 <<
16: 0fb86d44 400009a 14 System.Char[] 0 shared static WhitespaceChars
17: >> Domain:Value 0019daf0:03380324 11b42540:033855bc <<
我们很容易得到dn的真正内容为"CN=Dummy,CN=Accounts,CN=useradm,DC=dummy,DC=net"。如果你想获取更多信息,你可以继续使用!DumpObject这个命令来查看其它的那些成员字段。
下节内容预告:
我们将通过分析w3wp进程继续介绍!DumpObject的使用方法,当然还有更多强大的命令介绍。
请访问我的博客
舶来