Hi all,

I realize that you probably haven't all had a chance to look at the Crash lab yet, considering I just published it on Friday but here is the review for it.  I have to say that considering the amount of downloads for the buggy bits site I am a bit surprised with the lack of questions about the labs.  Perhaps they were too easy, or perhaps people just haven't had the time to go through them yet... either case, feel free to ask questions or comment on the labs, even questions that aren't directly related to the labs are welcome. My comments and answers are shown inline in red.

I will be publishing a Lab 2 which will be a memory lab later this week.

Reproduce the problem

1. Browse to the reviews page http://localhost/BuggyBits/Reviews.aspx, you should see a couple of bogus reviews for BuggyBits

2. Click on the Refresh button in the reviews page. This will crash the w3wp.exe process (or aspnet_wp.exe on IIS 5) 

    Note: If you have Visual Studio installed a Just-In-Time Debugger message may pop up (just click no for the purposes of this excercise).    
    However since this message box will sit there and wait for user input in order to shut down the app you may want to
disable JIT debugging if you have visual studio 
    installed on a test system.

Examine the eventlogs

1. Open the Application and System eventlogs, the information in the eventlogs will differ based on the OS and IIS version you are running. Among other events you may
    have a System Event looking something like this...

Event Type:	Warning
Event Source:	W3SVC
Event Category:	None
Event ID:	         1009
Date:		2008-02-08
Time:		10:12:06
User:		N/A
Computer:   	MYMACHINE
Description:
A process serving application pool 'DefaultAppPool' terminated unexpectedly. The process id was '4592'. 
The process exit code was '0xe0434f4d'.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

Q: What events do you see?

A: On my machine (Win2k3, .NET 2.0 SP1) i get the above message in the system eventlog and the following two messages in the application eventlog.

From the first one we can see that there was a nullreference exception of some kind...

Event Type:	Error
Event Source:	.NET Runtime 2.0 Error Reporting
Event Category:	None
Event ID:	         5000
Date:		2008-02-11
Time:		11:13:18
User:		N/A
Computer:	         MYMACHINE
Description:
EventType clr20r3, P1 w3wp.exe, P2 6.0.3790.3959, P3 45d6968e, P4 app_code.wbwztx_4, P5 0.0.0.0, P6 47a81aee, P7 13, P8 0, P9 system.nullreferenceexception, P10 NIL.

And the second one completely gives it away as a null reference exception that occurred in the finalizer/destructor for Review (Review.Finalize()). 
The good thing about this is that we don't really even need to debug this on W2k3, .NET 2.0 SP1, but on some combinations of IIS/OS/Framework you will not get
the 2nd error message.

Event Type:	Error
Event Source:	ASP.NET 2.0.50727.0
Event Category:	None
Event ID:	1334
Date:		2008-02-11
Time:		11:13:15
User:		N/A
Computer:	PRATHER
Description:
An unhandled exception occurred and the process was terminated.

Application ID: /LM/W3SVC/1/Root/BuggyBits

Process ID: 4944

Exception: System.NullReferenceException

Message: Object reference not set to an instance of an object.

StackTrace:    at Review.Finalize()

Q: What does the exit code 0xe0434f4d mean?

A: 0xe0434f4d is the native exception code for .NET exceptions, any .net exception will be translated into 0xe0434f4d, whether it is a SQLException, NullReferenceException or a FileNotFoundException.

Q: Can you tell from the eventlogs what it was that caused the crash? 

A: In this case as mentioned above, the 2nd application eventlog entry gives it away completely.  The process crashed because there was an unhandled exception on the finalizer thread when finalizing an object of type Review.

Get a memory dump

1. Browse to the reviews page http://localhost/BuggyBits/Reviews.aspx again, but don't click refresh

2. Open a command prompt and move to the debuggers directory and type in "adplus -crash -pn w3wp.exe" and hit enter

Q: A new window should appear on the toolbar, what is it? 

A: This is the debugger, if you maximize it you will see a cmd window with cdb running.  Cdb is the commandline equivalent for windbg (i.e. windbg without a UI).     

Q: What is the debugger waiting for? Hint: Check the help files for ADplus/crash mode in windbg

A: Here is a table showing what events/exceptions adplus monitors for by default, and what it will do if any of these occurr

Abbreviations L=Log entry in debuglog, T=Record Time, S=Log stack, M=Mini dump, F=Full dump, E=log in the eventlog.

You can change how these events are handled by providing your own cfg file to adplus.

CodeException/EventExplanation/Note1st Chance2nd Chance
avAccess ViolationAn attempt to access to uninitialized memory.
This is also thrown for .net NullReferenceExceptions
LTSMLTSFE
chInvalid HandleAn attempt to access data using an invalid handle. Eg. window, file or other native handle.LTSMLTSFE
iiIllegal InstructionAn attempt is made to execute an invalid instruction.LTSMLTSFE
dzInteger divideInteger divide by zero. Even in .net.LTSMLTSFE
c000008eFloating divideFloating point divide by zero.LTSMLTSFE
iovInteger overflowAn attempt to store more than 32/64 bits in an integer.LTSMLTSFE
lsqInvalid Lock SequenceAn attempt is made to execute an invalid lock sequence.LTSMLTSFE
sovStack overflowThe memory used for the stak (method pointers, parameters, local vars etc.) exceeds the amount allocated for the stack. Typically caused by recursion.LTSMLTSFE
ehC++ EH exceptionC++ EH ExceptionLTSLTSFE
*Unknown exceptionAny exception not specifically defined in this listLTSLTSFE
clr.NET ExceptionAny .net exception-LTSFE
bpeBreakpointA break point is hit or the debugger window is shut down. LTSME-
wkdWake DebuggerWake debugger.LTSME-
ldDll LoadA dll is loaded in the processLL
udDll UnloadA dll is unloaded from the processLL
eprExit ProcessThe process exits.LTSFE-
aphApplication HangThe Windows OS has determined that the process is no longer responding.LTSMLTSFE
cceCtrl+C Console AppCtrl+C or Ctrl+Break is passed to a console app.LTSMLTSFE
dmData MisalignedA unit of data is read from or written to an address that is not a multiple of the data size.LTSMLTSFE
gpGuard page violationAn attempt to access data within a guard page.LTSMLTSFE
ipIn Page IO ErrorI/O ErrorLTSMLTSFE
iscInvalid System CallInvalid System CallLTSMLTSFE
sboStack Buffer overflowStack buffer overflowLTSMLTSFE

3. Reproduce the issue by clicking on the refresh button in the reviews page.

Q: What files got created in the dump folder?  Note: The dump folder will be located under your debuggers directory with the name crash_mode and todays date and time 

A: The following files were created in my crash mode folder (under the debuggers folder)

ADPlus_report.txt
<DIR> CDBScripts
PID-4628__W3WP.EXE__Date_02-11-2008__Time_13-14-0808.log
Process_List.txt
PID-4628__W3WP.EXE__1st_chance_AccessViolation__mini_17a4_2008-02-11_13-14-19-358_1214.dmp
PID-4628__W3WP.EXE__1st_chance_Process_Shut_Down__full_17a4_2008-02-11_13-14-31-889_1214.dmp
PID-4628__W3WP.EXE__2nd_chance_NET_CLR__full_17a4_2008-02-11_13-14-20-093_1214.dmp

So we have the normal log files, CDBScripts, ADPlus_report.txt and the process list which I described in the review for lab 1

Apart from this we have a minidump for a 1st chance access violation (probably our NullReferenceException), and then a 2nd chance CLR exception (the same nullreference exception, but now 2nd chance since it wasn't handled), followed by a process shutdown dump.  You can see in which order they occurred by looking at the timestamps on them.

 

Open the dump in windbg

1. Open the dump file labeled 2nd Chance CLR Exception in windbg (file/open crash dump).  Note that this dump got created just before the 1st chance process shutdown.

Note: if you throw an exception (.net or other) you have a chance to handle it in a try/catch block.  The first time it is thrown it becomes a 1st chance exception and is non-fatal.  If you don't handle the exception it will become a 2nd chance exception (unhandled exception) and any 2nd chance exceptions will terminate the process.

2. Set up the symbol path and load sos (see the setup instructions for more info)

 

In a crash dump, the active thread is the one that caused the exceptions (since the dump is triggered on an exception).

Q: Which thread is active when you open the dump? Hint: check the command bar at the bottom of the windbg window.

A: The active thread is thread #20, which if I compare it to the !threads output is the Finalizer thread

 

20 2 116c 001ac540 b220 Enabled 02f0d28c:02f0f130 00199400 0 MTA (Finalizer) System.NullReferenceException (02f0cb3c)

In the !threads output we can also see that this thread threw a NullReferenceException and that the address of the Exception is 0x02f0cb3c

Examine the callstacks and the exception

1. Examine the native and managed callstacks. 

kb 2000
!clrstack

Q: What type of thread is it?

A: We can see from kb that this is in fact the finalizer thread but !clrstack does not give us a whole lot of information about what the managed stack looks like because we are deep in the exception handler code. 

0:020> kL 2000
ChildEBP RetAddr  
02a0f9d8 79f071ac kernel32!RaiseException+0x53
02a0fa38 7a10733d mscorwks!RaiseTheExceptionInternalOnly+0x2a8
02a0fa50 7a1073b1 mscorwks!RaiseTheException+0x4e
02a0fa78 7a1073e2 mscorwks!RealCOMPlusThrowWorker+0x71
02a0faa4 7a1073f0 mscorwks!RealCOMPlusThrow+0x30
02a0fab4 7a14f0d4 mscorwks!RealCOMPlusThrow+0xd
02a0fbe4 7a025a5b mscorwks!Thread::RaiseCrossContextException+0x41f
02a0fc98 79f01747 mscorwks!Thread::DoADCallBack+0x293
02a0fcb8 79f04a71 mscorwks!ManagedThreadBase_DispatchInner+0x47
02a0fcc4 79f04b01 mscorwks!ManagedThreadBase::FinalizerAppDomain+0x25
02a0fd20 79efe5b4 mscorwks!SVR::DoOneFinalization+0x141
02a0fda8 79efe70a mscorwks!SVR::FinalizeAllObjects+0x56
02a0fdc0 79ef3207 mscorwks!SVR::GCHeap::FinalizerThreadWorker+0xe7
02a0fdd4 79ef31a3 mscorwks!ManagedThreadBase_DispatchInner+0x4f
02a0fe68 79ef30c3 mscorwks!ManagedThreadBase_DispatchMiddle+0xb1
02a0fea4 79fb9643 mscorwks!ManagedThreadBase_DispatchOuter+0x6d
02a0fecc 79fb960d mscorwks!ManagedThreadBase_NoADTransition+0x32
02a0fedc 79fd0c91 mscorwks!ManagedThreadBase::FinalizerBase+0xd
02a0ff14 79f95a2e mscorwks!SVR::GCHeap::FinalizerThreadStart+0xbb
02a0ffb8 7d4dfe21 mscorwks!Thread::intermediateThreadProc+0x49
02a0ffec 00000000 kernel32!BaseThreadStart+0x34
0:020> !clrstack
OS Thread Id: 0x116c (20)
ESP       EIP     
02a0fa8c 7d4e2366 [GCFrame: 02a0fa8c] 
02a0faf0 7d4e2366 [GCFrame: 02a0faf0] 
02a0fcec 7d4e2366 [GCFrame: 02a0fcec] 

Q: What is this thread doing?

A: It is trying to call the finalizer of some object and in doing so it threw an exception that is now being rethrown as a 2nd chance exception.

2. Examine the exception thrown

!pe

Note: !pe/!PrintException will print out the current exception being thrown on this stack if no parameters are given

Q: What type of exception is it?

A: A NullReferenceException 

0:020> !pe
Exception object: 02f0cb3c
Exception type: System.NullReferenceException
Message: Object reference not set to an instance of an object.
InnerException: 
StackTrace (generated):

StackTraceString: 
HResult: 80004003

Note: In some cases, like this one where the exception has been rethrown, the original stacktrace may not be available in the exception.  In cases like this you may get more information if you find the original exception

3. Look at the objects on the stack to find the address of the original exception

!dso

Q: What is the address of the original exception

Hint: Look at your previous pe output to see the address of the rethrown exception.  Compare this to the addresses of the objects on the stack.  You should have multiple exceptions, a few with the rethrown exception address but one of the bottommost exceptions will be the original one (look for one with a different address).

A:  The only NullReferenceException with a different address than the one we listed before (0x2f0cb3c) is 0x6f1e5e4

0:020> !dso
OS Thread Id: 0x116c (20)
ESP/REG  Object   Name
02a0f9f4 02f0cb3c System.NullReferenceException
02a0f9f8 02f0cb3c System.NullReferenceException
02a0fa40 02f0cb3c System.NullReferenceException
02a0fa58 02f0cb3c System.NullReferenceException
02a0faac 02f0cb3c System.NullReferenceException
02a0fabc 02f0cb3c System.NullReferenceException
02a0fb58 06f1e5e4 System.NullReferenceException
02a0fb5c 02f0bd70 System.Byte[]
02a0fb78 02f0cb3c System.NullReferenceException
02a0fb7c 02f0bd70 System.Byte[]
02a0fb8c 02f0bd70 System.Byte[]
02a0fc58 06eb281c System.RuntimeType
02a0fcb0 02efd4b0 Review

4. Print out the original exception and look at the information and the callstack

!pe <original exception address> 

0:020> !pe 06f1e5e4 
Exception object: 06f1e5e4
Exception type: System.NullReferenceException
Message: Object reference not set to an instance of an object.
InnerException: 
StackTrace (generated):
    SP       IP       Function
    02A0F950 0FE90F24 App_Code_wbwztx_4!Review.Finalize()+0x14

StackTraceString: 
HResult: 80004003

Q: In what method is the exception thrown?

A: In the finalizer/destructor for the Review class

Q: What object is being finalized?

Note: you could actually have gotten this information by dumping out the _exceptionMethodString of the rethrown exception as well, but with !pe of the original exception you get the information in a cleaner way.

A: A Review object

Q: Normally exceptions thrown in ASP.NET are handled with the global exception handler and an error page is shown to the user.  Why did this not occurr here?  Why did it cause a crash?

A: The global exception handler for ASP.NET only handles exceptions thrown when processing a request.  In this case the exception is thrown on the finalizer which means that there is no user there to see the exception and therefore there is noone to show the error page to.

In 1.1. and if you use legacy exception handling in 2.0, exceptions on the finalizer are non-fatal, i.e. they will not crash the process, instead the finalizer would just stop finalizing the current object.  This is very dangerous since this means that you may not process code that releases native objects and thus you may end up with a memory leak that is very hard to troubleshoot or other resource leaks like connection leaks etc.  All unhandled exceptions on the finalizer thread should be investigated and taken seriously.

Examine the code for verification

1. Open Review.cs to find the destructor/finalizer for the Review class 

    ~Review()
    {
        if (quote.ToString() != string.Empty)
        {
            quote = null;
        }
    }

Q: which line or method could have caused the exception

A: In this case the code for the Review destructor is fairly short so it is pretty easy to pinpoint where the exception occurred.  The only instruction in here that could have caused a NullRef is the quote.ToString() and this would occurr if quote = null.

As an extra excercise you can also examine the disassembly of the function to try to pinpoint better where in the function the exception is caused

!u <IP shown in the exceptionstack>

0:020> !u 0FE90F24 
Normal JIT generated code
Review.Finalize()
Begin 0fe90f10, size 57
0fe90f10 55              push    ebp		
0fe90f11 8bec            mov     ebp,esp
0fe90f13 57              push    edi
0fe90f14 56              push    esi
0fe90f15 53              push    ebx
0fe90f16 83ec14          sub     esp,14h
0fe90f19 33c0            xor     eax,eax
0fe90f1b 8945e8          mov     dword ptr [ebp-18h],eax
0fe90f1e 8bf1            mov     esi,ecx
0fe90f20 8b4e04          mov     ecx,dword ptr [esi+4]		
0fe90f23 8b01            mov     eax,dword ptr [ecx]
0fe90f25 ff5028          call    dword ptr [eax+28h]
0fe90f28 8bc8            mov     ecx,eax
0fe90f2a 8b15ac46eb0a    mov     edx,dword ptr ds:[0AEB46ACh] ("")
0fe90f30 e8db2a4d69      call    mscorlib_ni+0x2a3a10 (79363a10) (System.String.Equals(System.String, System.String), mdToken: 06000143)
0fe90f35 85c0            test    eax,eax
0fe90f37 7505            jne     0fe90f3e			
0fe90f39 33d2            xor     edx,edx
0fe90f3b 895604          mov     dword ptr [esi+4],edx
0fe90f3e c745e400000000  mov     dword ptr [ebp-1Ch],0
0fe90f45 c745e8fc000000  mov     dword ptr [ebp-18h],0FCh
0fe90f4c 685e0fe90f      push    0FE90F5Eh
0fe90f51 eb00            jmp     0fe90f53
0fe90f53 58              pop     eax
0fe90f54 ffe0            jmp     eax
0fe90f56 8d65f4          lea     esp,[ebp-0Ch]
0fe90f59 5b              pop     ebx
0fe90f5a 5e              pop     esi
0fe90f5b 5f              pop     edi
0fe90f5c 5d              pop     ebp
0fe90f5d c3              ret
0fe90f5e c745e800000000  mov     dword ptr [ebp-18h],0
0fe90f65 ebef            jmp     0fe90f56

If we look at the disassembly below and compare it to the method above we find a few things.

1. The exception address 0x0FE90F24 falls on the bolded lines
2. Right before the bolded line we set EAX to ptr(ESI+0x4) and it is when we use EAX that we access violate/get the nullreference exception.  ESI is typically used to hold the this    
    pointer (in this case the pointer to our Review object) so the question is really what is at ESI+0x4

    Dumping out the Review object that we found with !dso we can see that at offset 0x4 we have the member variable quote and looking at the value we can see that it is null.

0:020> !do 02efd4b0 
Name: Review
MethodTable: 0fe11cc4
EEClass: 0fde5860
Size: 16(0x10) bytes
 (C:/WINDOWS/Microsoft.NET/Framework/v2.0.50727/Temporary ASP.NET Files/buggybits/b27906cc/f5f91899/App_Code.wbwztx_4.dll)
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
790fd8c4  4000008        4        System.String  0 instance 00000000 quote
790fd8c4  4000009        8        System.String  0 instance 00000000 source

3.  If that wasn't enough proof we could also have seen approximately where we were at in the function by looking at the instructions below the point where we threw the exception. We have an instruction to move "" into edx, i.e. an empty string (string.Empty), and right after that there is a string compare operation which matches up to if(quote.ToString() != string.Empty)

Btw, the reason that quote was null is because ClearReview() sets it to null... The proper way to resolve this issue would have been to not have a finalizer (since we don't really need one given that we don't have to clean up any native resources) or at the very least we should have called GC.SuppressFinalize() in the ClearReview() method.

 

Until next time

Tess