Opened 10 days ago

Last modified 10 days ago

#2467 new help

Job submission failure

Reported by: apm Owned by: um_support
Priority: normal Component: Rose/Cylc
Keywords: Cc:
Platform: ARCHER UM Version:

Description

I am trying to do a partial re-run of one of my NEMO/CICE suites (u-ao922) on Archer from Puma through Rose/Cylc?, but there seems to be something wrong with the job submission. I submitted a job on Friday, but this morning it was still marked as "submitted" on the Rose GUI. Using "qstat" on Archer failed to find a job under my user id. I stopped the job this morning and resubmitted: again, it came up as "submitted" on the GUI, but no job was in the queue.

Has something changed in the queue/accounts settings?

Thanks,

Alex

Change History (1)

comment:1 Changed 10 days ago by ros

  • Component changed from NEMO/CICE to Rose/Cylc

Hi Alex,

When the nemo_cice.20100101T0000Z was submitted on Friday it was submitted to login2.archer.ac.uk.

Whilst it was running login2.archer.ac.uk developed problems. If you look in the log/suite/log file you will see that it is failing to login to that particular login node to check on the task's status. Cylc then will continue to check on the task's status through login2.archer.ac.uk even when you stopped and restarted the suite. Since it can't update the status it leaves it in the last known state which is "submitted".

In order to get the suite going again, you will need to manually check that the nemo_cice.20100101T0000Z task succeeeded by looking at the log files on ARCHER and assuming it has then manually change the status of the task in the Cylc GUI to succeeded to allow the suite to continue.

Cheers,
Ros.

Last edited 10 days ago by ros (previous) (diff)
Note: See TracTickets for help on using tickets.