#21 fixed where are the start dumps? lois umdoc


Just a quick one - where are the start dumps on hpcx? I need 2006072800.T+0 and it's not in /hpcx/home/n02/n02/umx/dumps/weather/global where I thought it would be

Thank you, Caroline

#1435 answered weird glitch in submitting jobs to ARCHER annette swr04ojb

I'm finding today that when I submit jobs to ARCHER sometimes the job will fail. It will be at a point where it is trying to ssh something to ARCHER, but exactly which point seems to vary. If, upon encountering the error, I do nothing other than close the information window (by hitting OK), and press submit another time, then sometimes it will fail due to the same type of error but at a different point (e.g. doing an ssh for UMRECON rather than for UMSCRIPTS) and sometimes it will go through. Typically it doesn't seem to take more than 3 attempts to work, so it's not crippling, but it is a bit weird. Is this something to do with my ssh passwordless setup, something on PUMA, something on ARCHER, or something else entirely?

For info I've been mainly submitting xini#f and xini#g today.

#1436 answered way to "switch on" stack trace in HadGEM3? um_support swr04ojb


I was wondering if there was anything in particular I could switch on to be able to get stack-traces of where the model is falling over? At the moment run xinhf is falling over, and the .leave file is simply telling me the address in memory (see below) whilst it would be much more useful to me to have a stack trace back to the line that caused the error (even if this means the code takes longer to compile/run in the short term). Is this possible?

Rank 48 [Thu Jan  8 14:32:03 2015] [c5-0c1s8n0] application called MPI_Abort(comm=0xC4000002, 9) - process 48
_pmiu_daemon(SIGCHLD): [NID 01058] [c5-0c1s8n2] [Thu Jan  8 14:32:03 2015] PE RANK 76 exit signal Segmentation fault
_pmiu_daemon(SIGCHLD): [NID 01057] [c5-0c1s8n1] [Thu Jan  8 14:32:03 2015] PE RANK 68 exit signal Segmentation fault
_pmiu_daemon(SIGCHLD): [NID 01056] [c5-0c1s8n0] [Thu Jan  8 14:32:03 2015] PE RANK 48 exit signal Aborted
_pmiu_daemon(SIGCHLD): [NID 01052] [c5-0c1s7n0] [Thu Jan  8 14:32:03 2015] PE RANK 11 exit signal Segmentation fault
_pmiu_daemon(SIGCHLD): [NID 01059] [c5-0c1s8n3] [Thu Jan  8 14:32:03 2015] PE RANK 94 exit signal Segmentation fault
[NID 01052] 2015-01-08 14:32:03 Apid 12460156: initiated application termination
qsatmos: waiting for qsserver to complete on pid 1685

(and no useful additional output in the remainder of the file) which doesn't allow me to diagnose where the error is stemming from.

Oh, and in case it helps, I've already switched on:

  • Comp & Run options → Comp & Run options for Atmos
    "Define the level of optimisation"
    —> changed from Safe to Debug
  • Input / Output Control… → Output Choices
    "Level of Print Output From Model"
    —> changed from normal to - what I assumed was the maximum - Extra
