One cannot call oneself a Java geek if you haven't done JVM crash dump analysis. I mean, a C programmer would laugh at you if you tell them you don't know how to look at the stack dump. Well, I just had the pleasure of doing an analysis, so I'll show you how to do it, in case you don't know how :-) — This is on Windows, BTW.
Everything starts with thehs_err_pid*.logfile that JVM creates upon a crash. First you look at the cause of the death, which is at the very top of the crash dump file:
# EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x6d962f80, pid=3628, tid=628
...
siginfo: ExceptionCode=0xc0000005, reading address 0x00000000
So it's trying to read from a NULL pointer, resulted in an access violation. This is the native equivalent of NullPointerException.
Now that we know why it's dead, next we check where it's dead. For this, you look at the reconstructed stack trace:
<span class="tag" style="color: rgb(0, 0, 136); "><br</span><span class="pln" style="color: rgb(0, 0, 0); "> </span><span class="tag" style="color: rgb(0, 0, 136); ">/></span><span class="pln" style="color: rgb(0, 0, 0); ">Stack: [0x0ae60000,0x0aeb0000], sp=0x0aeae810, free space=314k</span><span class="tag" style="color: rgb(0, 0, 136); "><br</span><span class="pln" style="color: rgb(0, 0, 0); "> </span><span class="tag" style="color: rgb(0, 0, 136); ">/></span><span class="pln" style="color: rgb(0, 0, 0); ">Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)</span><span class="tag" style="color: rgb(0, 0, 136); "><br</span><span class="pln" style="color: rgb(0, 0, 0); "> </span><span class="tag" style="color: rgb(0, 0, 136); ">/></span><span class="pln" style="color: rgb(0, 0, 0); ">V [jvm.dll+0xd2f80]</span><span class="tag" style="color: rgb(0, 0, 136); "><br</span><span class="pln" style="color: rgb(0, 0, 0); "> </span><span class="tag" style="color: rgb(0, 0, 136); ">/></span><span class="pln" style="color: rgb(0, 0, 0); ">C [hudson-remoting38602.jnidispatch.dll+0x764c]</span><span class="tag" style="color: rgb(0, 0, 136); "><br</span><span class="pln" style="color: rgb(0, 0, 0); "> </span><span class="tag" style="color: rgb(0, 0, 136); ">/></span><span class="pln" style="color: rgb(0, 0, 0); ">j com.sun.jna.Native.initIDs()V+0</span><span class="tag" style="color: rgb(0, 0, 136); "><br</span><span class="pln" style="color: rgb(0, 0, 0); "> </span><span class="tag" style="color: rgb(0, 0, 136); ">/></span><span class="pln" style="color: rgb(0, 0, 0); ">j com.sun.jna.Native.</span><span class="tag" style="color: rgb(0, 0, 136); "><clinit></span><span class="pln" style="color: rgb(0, 0, 0); ">()V+70</span><span class="tag" style="color: rgb(0, 0, 136); "><br</span><span class="pln" style="color: rgb(0, 0, 0); "> </span><span class="tag" style="color: rgb(0, 0, 136); ">/></span><span class="pln" style="color: rgb(0, 0, 0); ">v ~StubRoutines::call_stub</span><span class="tag" style="color: rgb(0, 0, 136); "><br</span><span class="pln" style="color: rgb(0, 0, 0); "> </span><span class="tag" style="color: rgb(0, 0, 136); ">/></span>
Unlike Java, in the native land the stack trace reconstruction cannot be reliably done, so this may contain some wrong information (in fact it turns out it was missing one stack frame in this case.) But it's always good approximation to start with.
From this we can see that my native code (which iscom.sun.jna.Native.initIDs()— this is a native method) called into JNI and that killed it. So now the question is, which line in initIDs() called which JNI method with what parameters?The initIDs method is 20-30 lines long, so it's not obvious which part of this is offset 0x764c.
So now we need to disassemble the DLL to pin-point where the offset 0x764c is. The tool you use for this purpose isdumpbin, and you can install it through freeVisual C++ Express Edition.
First, check the loading address, as below. In this case, this DLL wants to be loaded to 0x10000000, so all the disassembly you'll see starts from 0x10000000. In other words,jnidispatch.dll+0x764cshould appear as 0x1000764c.
$ dumpbin /headers jnidispatch.dll | less
...
OPTIONAL HEADER VALUES
10B magic # (PE32)
2.56 linker version
A800 size of code
4A00 size of initialized data
800 size of uninitialized data
1060 entry point (10001060)
1000 base of code
C000 base of data
10000000 image base (10000000 to 1004BFFF)
Now disassemble the DLL:
$ dumpbin /exports /disam jnidispatch.dll > jnidispatch.asm
... and look for 1000764c:
10007633: 8B 45 EC mov eax,dword ptr [ebp-14h]
10007636: 8B 84 85 68 FF FF mov eax,dword ptr [ebp+eax*4+FFFFFF68h]
FF
1000763D: 89 44 24 04 mov dword ptr [esp+4],eax
10007641: 8B 45 08 mov eax,dword ptr [ebp+8]
10007644: 89 04 24 mov dword ptr [esp],eax
10007647: E8 50 EF FF FF call 1000659C
1000764C: 89 C2 mov edx,eax
1000764E: 8B 45 08 mov eax,dword ptr [ebp+8]
10007651: 8B 08 mov ecx,dword ptr [eax]
10007653: 89 54 24 0C mov dword ptr [esp+0Ch],edx
10007657: 8B 45 F0 mov eax,dword ptr [ebp-10h]
1000765A: 89 44 24 08 mov dword ptr [esp+8],eax
1000765E: 8B 45 F4 mov eax,dword ptr [ebp-0Ch]
10007661: 89 44 24 04 mov dword ptr [esp+4],eax
10007665: 8B 45 08 mov eax,dword ptr [ebp+8]
10007668: 89 04 24 mov dword ptr [esp],eax
1000766B: 8B 81 68 02 00 00 mov eax,dword ptr [ecx+00000268h]
10007671: FF D0 call eax
10007673: 83 EC 10 sub esp,10h
10007676: 8D 45 EC lea eax,[ebp-14h]
10007679: FF 00 inc dword ptr [eax]
1000767B: E9 31 FF FF FF jmp 100075B1
10007680: 8D 65 F8 lea esp,[ebp-8]
10007683: 5E pop esi
10007684: 5F pop edi
10007685: 5D pop ebp
10007686: C2 08 00 ret 8
OK, so it's calling into 0x1000659c, and now we know that the reconstructed stack frame in thehs_err_pid*.logfile missed this frame.
Also, we notice that the "ret 8" statement is very close, so this must be almost at the very end of theinitIDsfunction (for some reasons, Visual C++ compiler never seem to produce multiple ret statements for a function.)
So this must be the assembly for the following line in the C source code:
(*env)->SetStaticObjectField(env, cls, fid, newJavaPointer(env, types[i]));
}
}
In reality, you should spend some time looking at the native code to make sure the mapping between C code and assembly is correct. If you have the*.pdbfile produced by the linker for the DLL, this step is substantially simplified because of the debug information produced in the PDB file.dumpbinis capable of loading it and using it for disassembly if it's available side-by-side with the DLL.
Now, let's see what's in the function 0x1000659c. The corresponding C code is the following, so if our guess is right, it should be a fairly short function:
jobject
newJavaPointer(JNIEnv *env, void *p)
{
jobject obj = NULL;
if (p != NULL) {
obj = (*env)->NewObject(env, classPointer, MID_Pointer_init, A2L(p));
}
return obj;
}
And indeed, the disassembly is simple, and matches this C code exactly:
1000659C: 55 push ebp
1000659D: 89 E5 mov ebp,esp
1000659F: 83 EC 18 sub esp,18h
100065A2: C7 45 FC 00 00 00 mov dword ptr [ebp-4],0
00
100065A9: 83 7D 0C 00 cmp dword ptr [ebp+0Ch],0
100065AD: 74 35 je 100065E4
100065AF: 8B 45 08 mov eax,dword ptr [ebp+8]
100065B2: 8B 08 mov ecx,dword ptr [eax]
100065B4: 8B 45 0C mov eax,dword ptr [ebp+0Ch]
100065B7: BA 00 00 00 00 mov edx,0
100065BC: 89 44 24 0C mov dword ptr [esp+0Ch],eax
100065C0: 89 54 24 10 mov dword ptr [esp+10h],edx
100065C4: A1 40 F4 00 10 mov eax,dword ptr ds:[1000F440h]
100065C9: 89 44 24 08 mov dword ptr [esp+8],eax
100065CD: A1 30 F2 00 10 mov eax,dword ptr ds:[1000F230h]
100065D2: 89 44 24 04 mov dword ptr [esp+4],eax
100065D6: 8B 45 08 mov eax,dword ptr [ebp+8]
100065D9: 89 04 24 mov dword ptr [esp],eax
100065DC: 8B 41 70 mov eax,dword ptr [ecx+70h]
100065DF: FF D0 call eax
100065E1: 89 45 FC mov dword ptr [ebp-4],eax
100065E4: 8B 45 FC mov eax,dword ptr [ebp-4]
100065E7: C9 leave
100065E8: C3 ret
So at this point it looks like theNewObjectcall crashed. So now, what are the actual parameter values that we passed? Was one of them null?
To answer this question, we go back to thehs_err_pid*.logand look at the stack top:
Top of Stack: (sp=0x0aeae810)
0x0aeae810: 6d9634fa 0aeae8e4 1000c068 06be9978
0x0aeae820: 0adf4800 0ad80be8 0000021e 6da7ab04
0x0aeae830: 0adf4800 00000006 00000004 0aeae8e4
0x0aeae840: 1000c068 0aeae948 06be9978 0adf4800
0x0aeae850: 0ad80be8 0000021e 6da7ab04 100065e1
0x0aeae860: 0adf4914 00000000 00000000 1000e610
0x0aeae870: 00000000 00000000 0aeae948 1000764c
0x0aeae880: 0adf4914 1000e610 0aeae890 0ad80bf4
In C, parameters are passed through stack, so if we are lucky and the JVM didn't use that many stack before it dies, we can see our parameters in this list.
The key to find it is to look for a known value, which is the return address of the call. See, we know that thenewJavaPointermethod will return tojnidispatch.dll+0x764c. So if we can locate this in the stack, we can locate the parameters.
To do this, we first check the DLL table to see what address our DLL is actually loaded into. This information is also captured inhs_err_pid*.log:
Dynamic libraries:
0x00400000 - 0x00423000 C:\jdk1.6.0_10\bin\java.exe
0x7c900000 - 0x7c9b0000 C:\WINDOWS\system32\ntdll.dll
0x7c800000 - 0x7c8f4000 C:\WINDOWS\system32\kernel32.dll
0x77dd0000 - 0x77e6b000 C:\WINDOWS\system32\ADVAPI32.dll
0x77e70000 - 0x77f01000 C:\WINDOWS\system32\RPCRT4.dll
0x7c340000 - 0x7c396000 C:\jdk1.6.0_10\jre\bin\msvcr71.dll
0x6d890000 - 0x6dae6000 C:\jdk1.6.0_10\jre\bin\client\jvm.dll
0x77d40000 - 0x77dd0000 C:\WINDOWS\system32\USER32.dll
0x77f10000 - 0x77f58000 C:\WINDOWS\system32\GDI32.dll
0x76b40000 - 0x76b6d000 C:\WINDOWS\system32\WINMM.dll
0x6d340000 - 0x6d348000 C:\jdk1.6.0_10\jre\bin\hpi.dll
0x76bf0000 - 0x76bfb000 C:\WINDOWS\system32\PSAPI.DLL
0x6d840000 - 0x6d84c000 C:\jdk1.6.0_10\jre\bin\verify.dll
0x6d3e0000 - 0x6d3ff000 C:\jdk1.6.0_10\jre\bin\java.dll
0x6d880000 - 0x6d88f000 C:\jdk1.6.0_10\jre\bin\zip.dll
0x10000000 - 0x1004c000 C:\WINDOWS\Temp\hudson-remoting38602.jnidispatch.dll
0x77c10000 - 0x77c68000 C:\WINDOWS\system32\msvcrt.dll
So we can see that our DLL is loaded to 0x10000000, exactly where it wanted to be loaded to (but this is not always the case.) So,jnidispatch.dll+0x764c=0x1000764c. Do we see it in the top of the stack? Yes!
Top of Stack: (sp=0x0aeae810)
0x0aeae810: 6d9634fa 0aeae8e4 1000c068 06be9978
0x0aeae820: 0adf4800 0ad80be8 0000021e 6da7ab04
0x0aeae830: 0adf4800 00000006 00000004 0aeae8e4
0x0aeae840: 1000c068 0aeae948 06be9978 0adf4800
0x0aeae850: 0ad80be8 0000021e 6da7ab04 100065e1
0x0aeae860: 0adf4914 00000000 00000000 1000e610
0x0aeae870: 00000000 00000000 0aeae948 1000764c
0x0aeae880: 0adf4914 1000e610 0aeae890 0ad80bf4
From here, we can see the two parameters to thenewJavaPointer, which are env=0x0adf4914, and p=0x1000e610. Looks OK so far.
Then you mentally trace through the code. 0x0aeae948 must be the result ofpush ebp, thensub esp,18hmust have putespto 0x0aeae85c.
1000659C: 55 push ebp
1000659D: 89 E5 mov ebp,esp
1000659F: 83 EC 18 sub esp,18h
Then by comparing a bunch ofmovstatements and C code, you can see that the parameters are the following. The next item in the stack is100065e1which corresponds to the return address from theNewObjectcall, so we've got everything accounted for:
0xadf4914
| 0x00000000 0x00000000 0x000000001000e610
| | | |
v v v v
obj = (*env)->NewObject(env, classPointer, MID_Pointer_init, A2L(p));
And indeed, we are passing NULLs! Aha!
So, I hope you (and in particular me in 6 months) get the idea of how to do this. You saw thaths_err_id*.logfile is full of useful information.
And please, for the love of God, create the map file and the PDB file and keep them somewhere. The forensic analysis like this would be so much easier if you have them.
Hi,
THanks for you post. It's very useful. But I do have an issue regarding this process. I don't find ahs_err_pid*.log file in my folder. Actually, not all the times. It's generated some times and sometimes it's not. Is there a definite way to make sure that this log file is generated?
Regards,
Murali V
http://weblogs.java.net/blog/kohsuke/archive/2009/02/crash_course_on.html