Opened 5 months ago

Closed 2 months ago

#3420 closed help (answered)

query about restarting idealised UM from previous dumps

Reported by: PHill Owned by: um_support
Component: UM Model Keywords:
Cc: Platform: ARCHER
UM Version: 11.0

Description

I’ve been trying to restart some idealised UM simulations in order to extend the run length from 100 to 125 days (suite u-bx627).

To try and do this, I’ve changed the astart file in the nlcfiles namelist to the relevant existing dump file, changed the model basis time to the corresponding time and turned off both build and reconfiguration. To test whether this reproduces my previous results, I set this to 3 days before the end of the previous simulation.

I was expecting/hoping the output for the first 3 days of this new run to match that from the last 3 days of the previous run, but they steadily diverge. The simulation is run with cycling every 24 hours, so if I understand how that works correctly, the previous simulation should have gone through a cycle where it was initialised by the same dump.

Is there another setting I need to change to do this properly?

Thanks!

Change History (8)

comment:1 Changed 5 months ago by grenville

Peter

You can simply extend the end time for a Rose suite and then restart the suite (see 6.7 Restarting a suite in the practical exercises here http://cms.ncas.ac.uk/wiki/UmTraining/November2019) - there should be no need to change anything else in the suite.

It sounds like you have effectively started a new integration - doing that means the model initialization follows a different code path than it would for an extra cycle, so results won't bit compare.

I'm not sure how you can recover from this if the UM history has been lost. Please allow us read permission on your ARCHER files:

chmod -R g+rX /home/n02/n02/<your-username>
chmod -R g+rX /work/n02/n02/<your-username>

Grenville

comment:2 Changed 5 months ago by PHill

Hi Grenville,

I did try extending the end time and restarting the suite, using "rose suite-run —restart" but it didn't work - the Gcylc gui reopened with status "stopped with succeeded" (though this has worked for this suite in the past when the suite has stopped before it's original end time).

Does starting a new integration lead to different results to continuing to cycle even when initialising from the same file then? If so, is there a way to rerun specific time periods without changing results?

You should now have permissions to access my ARCHER files. I've been experimenting with some settings to try and get this to work - the relevant log directories are log.20201031T085350Z and log.20201111T090521Z

comment:3 Changed 5 months ago by grenville

Peter

I can't find log.20201031T085350Z or log.20201111T090521Z.

u-bx627 is set to run for 1 year?

I can't follow what is happening.

Grenville

comment:4 Changed 5 months ago by PHill

Hi Grenville,

Apologies, these directories are on puma in /home/yt910424/cylc-run/u-bx627. I didn't realise they would have different names on ARCHER. On ARCHER I deleted the log files for the first complete run (i.e. the equivalent to log.20201031T085350Z) because on one of my subsequent runs I got a "tar: log.20171214T143855Z.tar.gz: Cannot write: Broken pipe" failure. I guess I should have simply moved them instead. The ARCHER version of log.20201111T090521Z is log.20201111T090529Z.tar.gz

The current working copy of the suite is set up to re-run one day (not one year), because since failing to reproduce the results I re-ran the first cycle to check that gave me results that matched the original run.

comment:5 Changed 5 months ago by grenville

Hi Peter

Sorry for the slow reply - which was the last cycle of the 100-day run?

Grenville

comment:6 Changed 5 months ago by PHill

Hi Grenville,

The last cycle of the 100 day run is 20000410T0000Z.

Having said that, I tried a 1 day run then extending it to 2 days using using rose suite-run restart and it worked ok (or rather it failed for a reason I understand and can fix). So I expect I just did something silly when trying to restart the run after the 100 days completed. I do run into the problem of running out of disc space on puma fairly frequently, so perhaps I deleted something I shouldn't have… Perhaps it's best to close this ticket and I can open a new ticket if I do get this issue again?

I am still interested (from an academic perspective) in understanding why a new intergation differs from a continuation cycle starting from the same dump, and if there is a way to restart from previous output in a way that will reproduce the continuation cycle results? For instance if I wanted to add new diagnostics, but only want them from the last part of a simulation when things are in equilibrium? Is there any documentation that explains this?

Thanks,

Peter

comment:7 Changed 5 months ago by grenville

Hi Peter

I daresay those who wrote the UM initialization routines had their reasons for treating time step zero as a special case - I'm not sure who to ask for those reasons.
You can modify a suite to add new diagnostics and then extend the run with rose suite-run —restart. You could delve into the depths of the UM to remove the time step zero calls (not recommended) - I am not aware of documentation that addresses your case.

Grenville

comment:8 Changed 2 months ago by ros

  • Resolution set to answered
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.