Opened 4 years ago

Closed 4 years ago

#2083 closed help (answered)

series of simulations differeing by emissions but only some exceeding walltime limit

Reported by: s1374103 Owned by: um_support
Component: UKCA Keywords: walltime
Cc: Platform: MONSooN
UM Version: 8.4


Hi Helpdesk,

I am using vn8.4 HadGEM3 GA4.0 UKCA CheST+GLOMAP-mode RJ4.0 on monsoon (xlsjc).

I have added a some new tracers, chemical reactions, emissions and nudged towards ERA-Interim.

At this point, the walltime limit was set to the maximum (3 hours) and the model was exceeding this, so I changed the processors.Since then I have been carrying out loads of simulations and the wallclock has not been an issue.

A few weeks ago I decided to do a series of simulations ( 2 years each) where I change just the emissions. For these series of simulations I am going to describe, note that the code/diagnostics/UMUO setup is identical, just that they are pointing towards different ancillary files (with differences in emissions)

The first job ran for 2 years perfectly fine - xncae

However, xncak, xncal are all failing due to exceeding the wallclcock limit.

The job is set to run in 1 month chunks, and it has usually been doing it in around 2 hours 20 minutes. So I'm wandering why now is it only getting to day 20 in 3 hours.

At first I thought it could be random, which is why I've submitted multiple jobs but they are all failing at the same time (around 20 days).

In addition to this, xncah failed with a different error which I cannot undestand…

sys-108 : FATAL error closing unit 6 during program termination 

sys-108 : FATAL error closing unit 6 during program termination 

sys-5 : UNRECOVERABLE error on system request 
  Input/output error

Encountered during an I/O operation on unit 6
Fortran unit 6 is connected to a sequential formatted text file:
basename: missing operand
Try `basename --help' for more information.
basename: missing operand
Try `basename --help' for more information.

ATP Stack walkback for Rank 94 starting:

Is there perhaps some differences between my jobs that I haven't noticed?

Should I change the processors again? I don't want to do this as this would mean having to repeat xncae for consistency among these emissions experiments.

Please findthe .leave files in


Any ideas would be greatly appreciated.



Change History (2)

comment:1 Changed 4 years ago by luke

Hi Jamie,

I'm not sure about the xncah error - if this isn't repeatable it could be from a number of things. What seems to have happened is that processor number 74 couldn't write to the standard output stream, which is 6 and is then sent to the jobid.fort6.peNN files. The pe00 file is what becomes the output from the .leave file, but this is just from one of the many processors used in the simulation. If this happens again raise a ticket.

Are the jobs that used to take 2 hours 20 minutes nudged or free-running? Nudging generally adds time, so if this is new then that could explain things. Also, increasing processors can make things faster, but not always massively faster, see e.g.

Sometimes increasing the number of processors will make things run slower. In the first instance I would suggest changing the run length from 1-month to something like 15-20 days (making sure it's a multiple of the dump period). However, this will change the results due to the way the UKCA solver is initialised (i.e. 2 identical jobs, one in 10-day steps and one in 30-day steps would give different results, although scientifically they should be the same).

Would you be able to explain a bit more about the differences between the jobs?

Many thanks,

comment:2 Changed 4 years ago by grenville

  • Resolution set to answered
  • Status changed from new to closed

Closed for lack of acitvity

Note: See TracTickets for help on using tickets.