Opened 4 years ago

Closed 4 years ago

#2204 closed help (fixed)

JULES run failure on Lotus

Reported by: charlie Owned by: um_support
Component: JULES Keywords: Lotus
Cc: Platform: Other
UM Version:

Description (last modified by willie)

Dear Ros,

Sorry to bother you yet again, but we are going back to my JULES suite now - the good news is that it is now running, or at least it was. It ran for about half an hour and generated its first spin-up dump, but then failed. It got as far as timestep 302, but then failed with the error below. This looks to me like a machine-specific failure rather than anything going wrong with my run, but what does it mean?




Environment variables set for netCDF Fortran bindings in
You will also need to link your code to a compatible netCDF C library in
[WARN] file:imogen.nml: skip missing optional source: namelist:imogen_anlg_vals_list
[WARN] file:urban.nml: skip missing optional source: namelist:jules_urban_switches
[WARN] file:prescribed_data.nml: skip missing optional source: namelist:jules_prescribed_dataset(:)
[WARN] file:urban.nml: skip missing optional source: namelist:jules_urban2t_param
[WARN] file:ancillaries.nml: skip missing optional source: namelist:jules_crop_props
[WARN] file:ancillaries.nml: skip missing optional source: namelist:jules_irrig
[WARN] file:crop_params.nml: skip missing optional source: namelist:jules_cropparm
[WARN] file:urban.nml: skip missing optional source: namelist:urban_properties
[WARN] file:imogen.nml: skip missing optional source: namelist:imogen_run_list
[WARNING] required_vars_for_configuration: RFM river prognostics will be initialised to zero.
[WARNING] init_ic: Provided variable 'rgrain' is not required, so will be ignored
mpirun: propagating signal 12
User defined signal 2
MPI Application rank 0 killed before MPI_Finalize() with signal 12
Received signal ERR
cylc (scheduler - 2017-06-13T10:48:13Z): CRITICAL Task job script received signal ERR at 2017-06-13T10:48:13Z
cylc (scheduler - 2017-06-13T10:48:13Z): CRITICAL failed at 2017-06-13T10:48:13Z

(This is suite u-am232 running on Lotus)

Change History (8)

comment:1 Changed 4 years ago by charlie

Dear all,

Further to this (which I originally sent to Ros several days ago, but I gather she is away this week): I have already contacted people at CEH, and they don't know the answer. They think it is a machine-specific error, rather than anything to do with my suite. I have also contacted JASMIN support. The trouble is, they may not necessarily have much knowledge about JULES, so might say it's a JULES error!

Please can someone help?

As Ros says that the bottom of her message, my suite is u-am232 and I am submitting it to JASMIN/Lotus, from PUMA.


comment:2 Changed 4 years ago by simon

Hi Charlie,

It appears that your job was killed by the queuing system after 15 minutes. Try editing /home/charlie/roses/u-am232/suite.rc to increase the -W value towards the end of the file from 00:15


comment:3 Changed 4 years ago by charlie

Thanks very much Simon. What should I increase this to? And am I indeed using the right queue? At present, I'm using the par-multi queue which, according to the documentation at, is a medium priority queue with a maximum runtime of 48 hours. Is this the right one to use?

My JULES suite is currently set to run for 10 years which, when I was doing it on our own machines here, used to take about 2-3 days of real-time. So what should I change in my suite.rc to enable it to run for this long?


Last edited 4 years ago by charlie (previous) (diff)

comment:4 Changed 4 years ago by simon

A quick back of the envelope calculation gives a 3 day run time for a 10 year job at the rate of 300t/s per 15 minutes. This obviously wont fit into your current queue. As you appear to be running the serial version of JULES, I'd recommend the long-serial queue which is the only one to allow for >48 hours. I'd ask for 80 hours to be safe.

Change the -q value to long-serial and the -W to 80:00 at the end of your suite.rc

Btw, it's always a good idea to quit completely from rosie/ rose edit and restart after making any changes to your suit.rc

This is the lowest priority queue, however, so you may have to wait a bit longer for the job to be submitted.


comment:5 Changed 4 years ago by charlie

Thanks Simon, that's excellent.

Do you know roughly what the average waiting time is for this queue, just so I have a rough idea?

Also, whilst I remember - do you know what happens about storage on JASMIN? At the moment, I have my JULES output going to my home directory on JASMIN, so is this the correct location and how much space do I have here? If it's clearly not going to be enough, what should I do? I already have applied for access to the JULES workspace, so should my output be going here instead?


comment:6 Changed 4 years ago by simon


I'm afraid I don't know anything about jasmin beyond what I read on their help pages. I think your questions may be better answered by jasmin support. But, yes, I suspect your homespace will be limited, and that you should use the JULES workspace for your output.


comment:7 Changed 4 years ago by charlie

Thanks Simon, will do. Thanks for all your help.

comment:8 Changed 4 years ago by willie

  • Description modified (diff)
  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.