Opened 5 years ago

Closed 5 years ago

#1436 closed help (answered)

way to "switch on" stack trace in HadGEM3?

Reported by: swr04ojb Owned by: um_support
Component: UM Model Keywords:
Cc: Platform: ARCHER
UM Version: 8.5

Description

Hello,

I was wondering if there was anything in particular I could switch on to be able to get stack-traces of where the model is falling over? At the moment run xinhf is falling over, and the .leave file is simply telling me the address in memory (see below) whilst it would be much more useful to me to have a stack trace back to the line that caused the error (even if this means the code takes longer to compile/run in the short term). Is this possible?

Rank 48 [Thu Jan  8 14:32:03 2015] [c5-0c1s8n0] application called MPI_Abort(comm=0xC4000002, 9) - process 48
_pmiu_daemon(SIGCHLD): [NID 01058] [c5-0c1s8n2] [Thu Jan  8 14:32:03 2015] PE RANK 76 exit signal Segmentation fault
_pmiu_daemon(SIGCHLD): [NID 01057] [c5-0c1s8n1] [Thu Jan  8 14:32:03 2015] PE RANK 68 exit signal Segmentation fault
_pmiu_daemon(SIGCHLD): [NID 01056] [c5-0c1s8n0] [Thu Jan  8 14:32:03 2015] PE RANK 48 exit signal Aborted
_pmiu_daemon(SIGCHLD): [NID 01052] [c5-0c1s7n0] [Thu Jan  8 14:32:03 2015] PE RANK 11 exit signal Segmentation fault
_pmiu_daemon(SIGCHLD): [NID 01059] [c5-0c1s8n3] [Thu Jan  8 14:32:03 2015] PE RANK 94 exit signal Segmentation fault
[NID 01052] 2015-01-08 14:32:03 Apid 12460156: initiated application termination
qsatmos: waiting for qsserver to complete on pid 1685

(and no useful additional output in the remainder of the file) which doesn't allow me to diagnose where the error is stemming from.

Oh, and in case it helps, I've already switched on:

  • Comp & Run options → Comp & Run options for Atmos
    "Define the level of optimisation"
    —> changed from Safe to Debug
  • Input / Output Control… → Output Choices
    "Level of Print Output From Model"
    —> changed from normal to - what I assumed was the maximum - Extra

Change History (2)

comment:1 Changed 5 years ago by grenville

Hi

You can try running with ATP (Abnormal Termination Processing). In Script Modifications Defined Environment Variables table, set

ATP_ENABLED to 1.

You will need to have linked the model with the atp module loaded (this is our default).

You should get a stack trace in the .leave file.

Please see http://www.archer.ac.uk/documentation/best-practice-guide/debug.php for more info

Grenville

comment:2 Changed 5 years ago by grenville

  • Resolution set to answered
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.