Opened 6 weeks ago

Closed 3 weeks ago

#2950 closed help (fixed)

Problems with restarting a model run

Reported by: aschurer Owned by: ros
Component: UM Model Keywords:
Cc: Platform: Monsoon2
UM Version: 11.0

Description

I am trying to restart a UKESM simulation u-bf095 on Monsoon.

I increased the run length.

Went to my roses directory /home/d05/aschurer/roses/u-bf095 and entered:
rose suite-run —restart

It is failing on the coupled task with the message:
Can not find previous work directory for task coupled [FAIL] run_model # return-code=201 2019-07-09T15:15:33Z CRITICAL - failed/EXIT

I have noticed that the run created and then made a symbolic link to an empty work directory:
/working/d05/aschurer/cylc-run/u-bf095/work

Could this be the problem?

Thanks in advance,
Andrew

Change History (9)

comment:1 Changed 6 weeks ago by ros

  • Owner changed from um_support to ros
  • Status changed from new to accepted

Hi Andrew,

It looks like you previously ran this suite in January before the retirement of exvmsrose/exvmscylc. With that retirement rose/cylc configuration was changed to place the ~/cylc-run/SUITEID/work on the /working disk where previously it was on /projects. So a rose suite-run --restart won't work as this picks up the change and effectively deletes the existing /work directory. You will now need to configure the suite to do a new run - ie. set the start date and start dump appropriately.

Regards,
Ros.

comment:2 Changed 4 weeks ago by aschurer

Hi Ros,

Thank you for your reply.

Do you know if there is any documentation about changing start dumps to start a new run from an existing one?

If not can you advise on what is required?

Changing the start date and run length seems easy enough through rosie go.

But how many new restart files do you need to link to and where do you specify them?

I've found:
ainitial in app/um/rose-app.conf

TOP_TO_INIT in app/ocean_passive_tracers/rose-app.conf

CICE_START in app/nemo_cice/rose-app.conf
NEMO_ICEBERGS_START in app/nemo_cice/rose-app.conf
NEMO_START in app/nemo_cice/rose-app.conf

If I change start data, run length and change the paths in all the above files to point to the correct restart files is that everything?

Thanks in advance.
Andrew

comment:3 Changed 4 weeks ago by aschurer

I also have one additional question.

I want to change the date the model starts from (to Jan 1900) but keep the reference date the the model uses to write netcdf files the same as my existing results. How can I do this?

I've found the field "model basis time" - but I am unsure if this changes the start date or reference time or both?

Thanks,
Andrew

comment:4 Changed 4 weeks ago by ros

Hi Andrew,

Sorry for the delay in responding, quite a few of us are on leave this week. To start the coupled job from a new date you will need restart files for:

CICE_START
NEMO_START
NEMO_ICEBERGS_START
astart (assuming you are not re-running the reconfiguration otherwise ainitial)

I'm not sure about the reference time in the netcdf files. I will need to check up on that.

Regards,
Ros.

comment:5 Changed 3 weeks ago by ros

Hi Andrew,

The reference time in the Netcdf files is indeed taken from the model basis time, as is the start date.

Regards,
Ros.

comment:6 Changed 3 weeks ago by aschurer

Hi Ros,

Thanks for your advice.

I've changed the start files (those listed above) and the model basis time to 1900,1,1,0,0,0 and restarted this experiment.

The run fails with the error

? Error code: 10
? Error from routine: INITTIME
? Error message:
? Mismatch between model_basis_time read from namelist and validity time read
Rank 103 [Mon Jul 29 09:50:18 2019] [c10-0c1s10n0] application called MPI_Abort(MPI_COMM_WORLD, 9) - process 103
? from dump fixed header.
?
? model_basis_time = 1900 1 1 0 0 0
? fixhd validity time = 1850 1 1 0 0 0

I've run a grep command on the directory to see if anything was set to 1850 - and found this:

app/um/rose-app.conf: new_date_time=1850,1,1,0,0,0

Do you know what new_date_time controls? And should this be set to 1900,1,1,0,0,0?
Or can you advise what else might be causing this mismatch?

Many thanks,
Andrew

comment:7 Changed 3 weeks ago by ros

Hi Andrew,

The help in the rose edit GUI for the l_override_date_time option which controls this variables says:

"With this option, you can choose to either retain the date and time provided by the input model dump, or override various parts of it.

If you select to only override the year, please enter the new year in the first entry of new_date_time below. All other entries will be ignored.

If you select to override the full date and time,please provide the full date and time you wish to use in the order Year, Month, Day, Hour, Minute, Second in new_date_time."

If your input dump date is 1900 1 1 0 0 0 then you can just turn off l_override_date_time.

Cheers,
Ros.

comment:8 Changed 3 weeks ago by aschurer

Hi Ros,
Thank you for your help with this. The simulation has now restarted and is running OK.
Cheers,
Andrew

comment:9 Changed 3 weeks ago by ros

  • Resolution set to fixed
  • Status changed from accepted to closed

Thanks for letting us know. I'll close this query now.

Regards,
Ros.

Note: See TracTickets for help on using tickets.