My suite (u-au022) got stalled after the archer ssh_agent stopped working.
This happened while the coupled task was running at the cycle 21040601T0000Z. Before restarting the simulation, I edited the au022.xhist, ice.restart_file, namelist_cfg files and deleted the latest NEMO dumps (i.e. the 21040701 restart files) to make sure the restart point was 21040601T0000Z.
I retriggered the coupled task but the suite failed because can’t find the partial sum file:
That file was there before I restarted the simulation but it was automatically deleted when the simulation was restarted (don’t know why).

How can I recover that file? (and possibly all the partial sum files for the cycle 21040301T0000Z ?

Thank you very much,


Change History (5)

comment:1 Changed 9 months ago by ros

Hi Vittoria,

When an ssh-agent dies it does not affect anything that is already running in the queues on ARCHER. So although the agent died while the coupled job was running for cycle 21040601T0000Z and thus the cylc GUI couldn't update, the coupled task completed successfully. All you need to do in this scenario once you've fixed the ssh-agent and stopped the suite if it's died, is issue a "rose suite-run —restart" and cylc will automatically update the status of the tasks left running on ARCHER when the agent died and carry on.

We'll take a look at the status of the suite now as things will definitely be out of sync if you've edited files. Will get back to you on how to restart.


comment:2 Changed 9 months ago by grenville


The only way will be to regenerate it (them) - there is no back up of /work


comment:3 Changed 9 months ago by mvguarino

Hi Ros,

I understand, the status of the coupled task was indeed 'running' and that misled me.
I kept all the NEMO restart dumps and the old version of the namelists, so I will restore all of them and try to restart the suite (with the coupled task set on succeeded).

I'll let you know if that doesn't work.

Thank you,


comment:4 Changed 9 months ago by mvguarino

It is running.

Thank you,


comment:5 Changed 9 months ago by mvguarino

