Opened 5 months ago

Closed 3 months ago

#3223 closed help (answered)

Slow running of suites on Monsoon

Reported by: jjas3 Owned by: ros
Component: UKESM Keywords: Monsoon Slow
Cc: Platform: Monsoon2
UM Version: 11.2

Description

Hi CMS,

I've run into some problems with suites I am running on Monsoon - hope you can help!

I'm running 3 x 10 year UKESM AMIP suites (u-bs034, u-bs110, and u-bs111) on Monsoon. Each new submission of either postproc of atmos_main is not happening on its own (it gets stuck at 'submission retrying') and I have to manual trigger 'run now'. This is resulting in an output of less than 1 model year per day and I need to constantly monitor the jobs using 'cylc gscan'. I've checked with Luke and we should have enough node hours on 'project-ukca'.

I was wondering if there was anything that could explain the slowing down of these jobs and also why the submit was failing? I was getting nearly three model years a day in January and without the submission problem. I'm running another few 10 year jobs after this so would be great to get to the bottom of it.

I've emailed Monsoon@… too in case that was the right thing to do!

Many thanks for your help in advance.
Johnny

Change History (4)

comment:1 Changed 5 months ago by ros

  • Owner changed from um_support to ros
  • Status changed from new to accepted

HI Johnny,

The problem with the submission of the tasks is because you have the host set to xcs-c which causes the system to ssh from xcs-c to xcs-c giving rise to issues with temporary files. In the monsoon.rc file change the host line in the [[HPC]] family from

host = $(rose host-select xcs-c)
to
host = localhost

Regards,
Ros.

comment:2 Changed 5 months ago by jjas3

Hi Ros,

Many thanks for your help with this!

I made the changes outlined above in the monsoon.rc file of all my running suites, but the problem still persists.

Could this be because the suites are already running and so aren't able to pick up the changes? If so, is there a way to handle this?

Regards,
Johnny

comment:3 Changed 5 months ago by ros

Hi Johnny,

Whenever you make a change to a running suite you need to reload it to pick up the changes by running rose suite-run --reload.

Cheers,
Ros.

comment:4 Changed 3 months ago by ros

  • Resolution set to answered
  • Status changed from accepted to closed
Note: See TracTickets for help on using tickets.