wiki:Docs/AutomaticResubmission

UM Automatic Resubmission

Introduction

The scripts used to run the UM have been set up so that long experiments can be run in chunks that are suitable for the batch queue structure of the computer being used. The maximum run length of a batch job depends on which computer you are running your job. The number of UM days you can run in this time will depend on the performance of the computer, the resolution of the UM, the physics options selected, the amount of STASH you are asking for etc.

Initial Run

Set up your UM job via the UMUI. You will need to specify

a) The run length: Input output control and resources → Start date and run length options

Set the target run length to be the total time of your experiment.

b) The job resources for your runs:

  • Pre UM vn8.2: Input output control and resources → Job submission, resources etc

Set the job time limit to be the length of time needed to run a chunk of your experiment.
Then press NEXT at the bottom of the window to set up automatic resubmission. Select automatic resubmission and specify the target run length and job time limit for the chunks.

  • UM vn8.x: User information and submit method → Job submission method

Press Qsub at the bottom of the window then set job time limit to be the length of time needed to run the initial chunk of the experiment.
Then press Back and Resubmit and specify the target run length and job time limit for the continuation chunks.

c) The restart dump frequency: Atmosphere → control → post processing dumping and meaning → dumping and meaning

Specify a restart dump frequency in days/hours/timesteps. This restart dump frequency must be chosen so that you have a restart dump at the end of each chunk of the run. So for example if you are running a 3 year experiment in 6 month chunks and you are using 360 day years then a suitable restart dump frequency would be 30 days. You may also need to worry about climate meaning which depends on the restart dump frequency (see the document about Climate Meaning).

d) Archiving settings: Post processing → main switch and general questions

The standard UM scripts are set up so that you can automatically delete superseded restart dumps only when you have archiving switched on (see the document about Archiving).

Note: At UM vn 6.6.3 the Start date and run length options, Job submission and resources, and Post processing windows can be found under Submodel Independent

You can now submit this initial run in the normal way. When complete you should find only the first chunk has run, leaving some history files (*.*hist) and either i) all the restart dumps and the post processing files specified via STASH if automatic archiving has not been used, or ii) the last restart dump only if archiving is switched on and deletion of superseded dump files selected.

Continuation Run

Always check that the initial run (NRUN) has completed successfully.

To create the continuation run.

Before UM vn8.2

  • on your local machine do cd ~/umui_jobs/jobname
  • edit the SUBMIT file
  • change the line TYPE=NRUN to TYPE=CRUN
  • if your initial run was a compile and run job in one job then you need also to change to STEP=4 in the SUBMIT script as well.
  • save and exit
  • resubmit the UM run as you did for the initial run (NRUN) (Note: do not re-process as this will undo the changes you made to the files in umui_jobs).

Alternatively you may have a hand-edit included in your job:

  • In the UMUI navigate to Input/Output control and resources → User hand edit files
  • Select Y for the following hand-edit: ~ros/HadGEM3-A/vn7.3/HGPKG1/crun.ed
  • Make sure that all compile steps are switched off in Compilation and modifications:
    • Compile options for the UM model
    • Compile options for the UM reconfiguration
    • UM scripts build
  • Then SAVE, PROCESS and SUBMIT

UM vn8.x

  • In the UMUI navigate to Compilation & run options → compile & run options for atmos & recon
  • Switch off compilation of the model & reconfiguration
  • Switch off "run the reconfiguration"
  • The "type of model run" will then become active. Check that CRUN is selected.
  • Then save, process and resubmit.

The run should continue from where the initial run finished and then each of the chunks should submit themselves automatically without further intervention.

NB

  • The maximum time limits for the standard queue on Archer is 24 hours.
  • Without automatic archiving disk space can fill up and cause the job to crash.
Last modified 2 years ago Last modified on 17/04/15 11:35:12