A Solaris server running Glassfish keep crashing in my company and even the Oracle consultant has no clue, after he analysed the core dump files.
I have no idea on how to tackle the problem either. But it happen again and again and I decided that it is time for me to learn how to analyse core dump files.
Of course Google is my best friend for a task like this, but some knowledge of reverse engineering helps alot here, without of which might mean getting nothing out even when presented with the correct information.
pflags
The crash occured in thread number 243.
Since the library (/opt/jdk1.6.0_24/jre/lib/sparcv9/server/libjvm.so) is part of the Java Runtime Environment, I have filed a case with Oracle and they are currently investigating. I saved them some work infact :)
I have no idea on how to tackle the problem either. But it happen again and again and I decided that it is time for me to learn how to analyse core dump files.
Of course Google is my best friend for a task like this, but some knowledge of reverse engineering helps alot here, without of which might mean getting nothing out even when presented with the correct information.
pflags
Run pflags and look for signals that caused a crash. Here I found a SIGSEGV signal.
pflags core.hostname04.703.26100.java > pflags.txt
and I do a grep and found in the file:
/243: flags = DETACH
sigmask = 0xfffffeff,0x0000ffff cursig = SIGSEGV
sigmask = 0xfffffeff,0x0000ffff cursig = SIGSEGV
The crash occured in thread number 243.
pstack
pstack core.hostname04.703.26100.java > pstack.txt
and look for 243. I got :
----------------- lwp# 243 / thread# 243 --------------------
ffffffff7dda9840 jni_GetByteArrayRegion (1111471c8, fffffff80b6155c8, 0, 108, fffffff80b6054f8, 111147000) + f8
ffffffff7dda9840 jni_GetByteArrayRegion (1111471c8, fffffff80b6155c8, 0, 108, fffffff80b6054f8, 111147000) + f8
The function jni_GetByteArrayRegion is causing the crash. It is located in memory address ffffffff7dda9840.
pmap
pmap core.hostname04.703.26100.java > pmap.txt
portion of the pmap.txt:
....
FFFFFFFF7DB12000 16K r----
FFFFFFFF7DC00000 8192K r-x-- /opt/jdk1.6.0_24/jre/lib/sparcv9/server/libjvm.so
FFFFFFFF7E400000 2048K r-x-- /opt/jdk1.6.0_24/jre/lib/sparcv9/server/libjvm.so
FFFFFFFF7DC00000 8192K r-x-- /opt/jdk1.6.0_24/jre/lib/sparcv9/server/libjvm.so
FFFFFFFF7E400000 2048K r-x-- /opt/jdk1.6.0_24/jre/lib/sparcv9/server/libjvm.so
........
So the culprit is /opt/jdk1.6.0_24/jre/lib/sparcv9/server/libjvm.so
I found an easier way to locate the executable that contains the offending function. I will write about it in a later post.
Since the library (/opt/jdk1.6.0_24/jre/lib/sparcv9/server/libjvm.so) is part of the Java Runtime Environment, I have filed a case with Oracle and they are currently investigating. I saved them some work infact :)
Have a nice day :)
fook sheng
1 comment:
Hi Fook Sheng,
We are having similar problem. Would you mind sharing with us how to resolve the problem or some work around?
Post a Comment