#2548 closed help (fixed)

Lost partial sum file

Reported by: mvguarino Owned by: um_support
Component: UM Model Keywords: coupled, partial sum
Cc: Platform:
UM Version:

Description

Hello,

My suite (u-au022) got stalled after the archer ssh_agent stopped working.
This happened while the coupled task was running at the cycle 21040601T0000Z. Before restarting the simulation, I edited the au022.xhist, ice.restart_file, namelist_cfg files and deleted the latest NEMO dumps (i.e. the 21040701 restart files) to make sure the restart point was 21040601T0000Z.
I retriggered the coupled task but the suite failed because can’t find the partial sum file:
21040301T0000Z_au022a_s1a
That file was there before I restarted the simulation but it was automatically deleted when the simulation was restarted (don’t know why).

How can I recover that file? (and possibly all the partial sum files for the cycle 21040301T0000Z ?

Thank you very much,

Vittoria

Change History (5)

comment:1 Changed 13 months ago by ros

Hi Vittoria,

When an ssh-agent dies it does not affect anything that is already running in the queues on ARCHER. So although the agent died while the coupled job was running for cycle 21040601T0000Z and thus the cylc GUI couldn't update, the coupled task completed successfully. All you need to do in this scenario once you've fixed the ssh-agent and stopped the suite if it's died, is issue a "rose suite-run —restart" and cylc will automatically update the status of the tasks left running on ARCHER when the agent died and carry on.

We'll take a look at the status of the suite now as things will definitely be out of sync if you've edited files. Will get back to you on how to restart.

Regards,
Ros.

comment:2 Changed 13 months ago by grenville

Vittoria

The only way will be to regenerate it (them) - there is no back up of /work

Grenville

comment:3 Changed 13 months ago by mvguarino

Hi Ros,

I understand, the status of the coupled task was indeed 'running' and that misled me.
I kept all the NEMO restart dumps and the old version of the namelists, so I will restore all of them and try to restart the suite (with the coupled task set on succeeded).

I'll let you know if that doesn't work.

Thank you,

Vittoria

comment:4 Changed 13 months ago by mvguarino

It is running.

Thank you,

Vittoria

comment:5 Changed 13 months ago by mvguarino

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.