Opened 7 months ago

Closed 7 months ago

#3371 closed help (answered)

Suite not running to completion

Reported by: Leighton_Regayre Owned by: um_support
Component: UM Model Keywords: PPE suite tasks held
Cc: Platform: ARCHER
UM Version: 11.1



I am submitting a suite of perturbed parameter ensemble UKESM1 simulations. This suite (u-bw789) has been used extensively to create ensemble members.

My current submission ran 10 simulations for nearly a year (the length of model time requested) before stopping for unknown reasons. One member (ens_33) completed all atmos_main tasks. Other suite members did not. Yet, the log files show tasks they did complete were successful and no error messages were logged.

The rose gui indicated the suite was no longer running, so I restarted the suite to get the tasks for these members submitted once more. Several members immediately produced *pa* file output.

Is there a predictable reason for this sort of event? Is there a way for me to avoid needing to manually restart suites in this way?



Change History (3)

comment:1 Changed 7 months ago by ros

Hi Leighton,

It is very difficult to give an answer to this without being able to see all the log files at the point you had the problems. I can see some error messages in a couple of the current log files. It looks like you are very close to your PUMA quota and have had disk I/O errors.

Please also change the suite host, in site/archer.rc to be now that the UM access method has been rolled out to all login nodes. I know ARCHER have been doing some work on login7 following the rollout.

If the problem happens again, and you can't see any error messages, let us know when it happens and we can take a look.


comment:2 Changed 7 months ago by Leighton_Regayre

Hi Ros,

Thanks for taking a look. I'm relieved to hear this is not a wider issue. The restart has been a success, so the only implications for my workflow are practical. I have updated my site/archer.rc file as suggested.



comment:3 Changed 7 months ago by ros

  • Resolution set to answered
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.