Opened 5 months ago

Closed 3 months ago

#3432 closed help (fixed)

postproc stuck submit retrying on MONSooN

Reported by: scottan Owned by: um_support
Component: Rose/Cylc Keywords:
Cc: Platform: Monsoon2
UM Version: 11.2

Description

I'm having issues running suite u-by661, and others similar to it.

This suite is itself a copy of a UKESM1 job (u-bl028), that was run on MONSooN last year. Most of it runs fine, but it gets stuck on the postproc stages where it keep submit retrying until eventually reaching submit-failed. At this point I have to manually reset the state to waiting for it to carry on.

I've had similar issues in the past, and I've tried making changes to the MONSooN.rc file based on these experiences but have not been successful.

Any help appreciated.

Change History (4)

comment:1 Changed 4 months ago by grenville

Scott

It's complaining (see /home/d04/sanic/cylc-run/u-by661/log/suite) that it can't find a temp file. I don't know why. Do you still get the same error now if you retrigger the task?

Grenville

comment:2 Changed 4 months ago by grenville

Hi Scott

in /home/d04/sanic/roses/u-by661/site/MONSooN.rc, in the [[HPC]] section,change

[[[remote]]]
            host = $(rose host-select xcs-c)

to

[[[remote]]]
           host = locahost

Grenville

Last edited 3 months ago by ros (previous) (diff)

comment:3 Changed 4 months ago by scottan

Hi Grenville,

Thank you, that seems to have done the trick.

Best,
Scott

comment:4 Changed 3 months ago by ros

  • Component changed from UM Model to Rose/Cylc
  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.