Friday, July 22, 2011

Solaris Core dump analysis

A Solaris server running Glassfish keep crashing in my company and even the Oracle consultant has no clue, after he analysed the core dump files.

I have no idea on how to tackle the problem either. But it happen again and again and I decided that it is time for me to learn how to analyse core dump files.

Of course Google is my best friend for a task like this, but some knowledge of reverse engineering helps alot here, without of which might mean getting nothing out even when presented with the correct information.

pflags

Run pflags and look for signals that caused a crash. Here I found a SIGSEGV signal.

pflags core.hostname04.703.26100.java > pflags.txt

and I do a grep and found in the file:


/243:  flags = DETACH
       sigmask = 0xfffffeff,0x0000ffff  cursig = SIGSEGV


The crash occured in thread number 243.
pstack

pstack core.hostname04.703.26100.java > pstack.txt

and look for 243. I got :

-----------------  lwp# 243 / thread# 243  --------------------
ffffffff7dda9840 jni_GetByteArrayRegion (1111471c8, fffffff80b6155c8, 0, 108, fffffff80b6054f8, 111147000) + f8

The function jni_GetByteArrayRegion is causing the crash. It is located in memory address ffffffff7dda9840.

pmap

pmap core.hostname04.703.26100.java > pmap.txt

portion of the pmap.txt:

....
FFFFFFFF7DB12000         16K r----
FFFFFFFF7DC00000       8192K r-x--  /opt/jdk1.6.0_24/jre/lib/sparcv9/server/libjvm.so
FFFFFFFF7E400000       2048K r-x--  /opt/jdk1.6.0_24/jre/lib/sparcv9/server/libjvm.so
........


So the culprit is /opt/jdk1.6.0_24/jre/lib/sparcv9/server/libjvm.so


I found an easier way to locate the executable that contains the offending function. I will write about it in a later post.


Since the library (/opt/jdk1.6.0_24/jre/lib/sparcv9/server/libjvm.so) is part of the Java Runtime Environment, I have filed a case with Oracle and they are currently investigating. I saved them some work infact :)




Have a nice day :)

fook sheng