Opened 4 months ago

Closed 4 months ago

#3208 closed help (fixed)

Cannot restart suite

Reported by: m.couldrey Owned by: ros
Component: Rose/Cylc Keywords: suite, stopped, stuck,
Cc: Platform: NEXCS
UM Version:

Description

Hi CMS

My suite u-bq244 seems to have stopped (no tasks appear in the cylc gui for the suite) but when I try to restart the suite, I get the following
macou@xcslc0:~/roses/u-bq244> rose suite-run —restart
[FAIL] Suite "u-bq244" appears to be running:
[FAIL] Contact info from: "/home/d00/macou/cylc-run/u-bq244/.service/contact"
[FAIL] CYLC_SUITE_HOST=xcslc1
[FAIL] CYLC_SUITE_OWNER=macou
[FAIL] CYLC_SUITE_PORT=43061
[FAIL] CYLC_SUITE_PROCESS=178876 /usr/bin/python2 /common/fcm/cylc-7.8.3/bin/cylc-restart u-bq244 —host=localhost
[FAIL] Try "cylc stop 'u-bq244'" first?

I came across these instructions to stop stuck suites:
https://collab.metoffice.gov.uk/twiki/bin/view/Support/TroubleshootingStuckJobsInRose

but I'm not sure that the ps command shows me the PID of any job relating to my suite.

macou@xcslc0:~/roses/u-bq244> ps -fu ${USER}
UID PID PPID C STIME TTY TIME CMD
macou 55912 103619 0 09:28 pts/41 00:00:00 ps -fu macou
macou 103618 103605 0 09:07 ? 00:00:00 sshd: macou@pts/41
macou 103619 103618 0 09:07 pts/41 00:00:00 -bash
macou 103863 1 0 09:07 ? 00:00:00 /opt/hpctools/gnupg/2.0.31/bin/gpg-agent —daemon —al

I'm not really sure what's going on with this suite, but it's definitely stuck somewhere since the coupled run isn't moving forward and output hasn't been sent to jasmin for a few days now.

Thanks!
Matt

Change History (3)

comment:1 Changed 4 months ago by ros

  • Owner changed from um_support to ros
  • Status changed from new to accepted

Hi Matt,

If the suite has got itself very stuck then it's possible there are no running processes just the contact file still there. Have you tried removing the ~/cylc-run/SUITE/.service/contact file?

Cheers,
Ros.

comment:2 Changed 4 months ago by m.couldrey

Hi Ros

Ah I see, that's a new one for me. Thanks, that seems to have done it! I'll close the ticket.

Matt

comment:3 Changed 4 months ago by m.couldrey

  • Resolution set to fixed
  • Status changed from accepted to closed
Note: See TracTickets for help on using tickets.