Opened 3 weeks ago

Closed 3 weeks ago

Last modified 3 weeks ago

#2897 closed help (fixed)

suite u-be699 - coupled task fails

Reported by: xd904476 Owned by: ros
Component: UM Model Keywords:
Cc: Platform: ARCHER
UM Version:

Description

Hi,
I am having issues with my suite run again: ever since yesterday the coupled task of 2090 gets submitted and it fails after a short amount of run time. I am not sure about why.
Any suggestions?
Thanks

Change History (6)

comment:1 Changed 3 weeks ago by willie

Hi, What computer are you running the suite on? What error messages are you seeing?

Willie

comment:2 Changed 3 weeks ago by ros

  • Owner changed from um_support to ros
  • Platform set to ARCHER
  • Status changed from new to accepted

Hi Dani,

You have somehow managed to get multiple occurances of the coupled task running for the current cycle and they are unsurprisingly interferring with each other. Can you please kill the currently running coupled task in the cylc GUI and then on ARCHER do:

qstat -u dflocco

and then kill the listed jobs with qdel <job id>.

Then retrigger the coupled task.

Regards,
Ros.

comment:3 Changed 3 weeks ago by xd904476

  • Resolution set to fixed
  • Status changed from accepted to closed

Hi Willie and Ros,
yesterday a couple task was showing a failure and was retrying very often. Not sure about why it would be running moltiple times. I think at some point I did set it to "ready" after a failure.

I've set it to run again now. Thanks
Dani

comment:4 Changed 3 weeks ago by ros

Hi Dani,

I took a look at your logs and it looks like there was an issue with qsub on ARCHER which meant the cylc couldn't check on the status of the job and thus deemed it had failed. It then tried 5 or so times to resubmit it until it succeeded and thus you ended up with 2 jobs running.

I've sent the details to ARCHER.

Cheers,
Ros.

comment:5 Changed 3 weeks ago by ros

CMS note for completeness: Cray identified that this problem was due to a faulty blade which now been resolved.

comment:6 Changed 3 weeks ago by xd904476

Thanks for the follow up.
Dani

Note: See TracTickets for help on using tickets.