Opened 11 years ago
Closed 10 years ago
#488 closed help (fixed)
Core files on MONSooN
Reported by: | kipling | Owned by: | lois |
---|---|---|---|
Component: | MONSooN | Keywords: | |
Cc: | Platform: | ||
UM Version: | 7.3 |
Description
Using version 7.1 on HECToR, I was able to get core files when the model crashed by adding fcm:um_br/dev/ros/VN7.1_generate_core (which just adds "ulimit -c unlimited" to qsexecute); these would permit post-mortem debugging of the crash.
However, using version 7.3 on MONSooN, this doesn't appear to work with the equivalent branch (fcm:um_br/dev/ros/VN7.3_generate_core); e.g. my job xfgla was crashing with SIGFPE but not producing a core file.
A little digging suggests this relates to the SIGNAL_TRAP(0) call in UM_SHELL, which is only enabled for the IBM arch. Changing this to SIGNAL_TRAP(1) does lead to a core file being produced, but apparently from the wrong thread:
$ dbx bin/xfgla.exe core Type 'help' for help. warning: The core file is not a fullcore. Some info may not be available. [using memory image in core] reading symbolic information ... Floating point exception in _event_sleep at 0x90000000036baa4 0x90000000036baa4 (_event_sleep+0x108) e8410028 ld r2,0x28(r1) (dbx) where _event_sleep(??, ??, ??, ??, ??, ??) at 0x90000000036baa4 _p_sigtimedwait(??, ??, ??) at 0x900000000370bc4 pth_signal.sigwait(??, ??) at 0x900000000371cd4 pm_async_thread(??) at 0x900000000d7e5c8
(while the .leave file appears to have a correct backtrace, in this case from STASH).
My understanding is that on AIX a "fullcore" file is required for cross-thread debugging; however the AIX documentation suggests these can only be enabled at a system (rather than per-user or per-process) level…
(Removing the SIGNAL_TRAP call altogether leads to the SIGFPE being silently ignored.)
Is there a known way to get usable core files from UM7.3 on MONSooN, or should I take this up with their tech people?
Change History (2)
comment:1 Changed 11 years ago by lois
- Owner changed from um_support to lois
- Status changed from new to assigned
comment:2 Changed 10 years ago by lois
- Resolution set to fixed
- Status changed from assigned to closed
We could have a go at looking into this Zak but with some CMS people on leave/courses next week it may be quicker to see if the Met Office people can find you the solution for MONSooN quickly. If you don't get the core files you need then we will see what we can do.
Lois