wiki:Docs/AutomaticResubmission

Version 2 (modified by ros, 7 years ago) (diff)

UM Automatic Resubmission

Introduction

The scripts used to run the UM have been set up so that long experiments can be run in chunks that are suitable for the batch queue structure of the computer being used. The maximum run length of a batch job depends on which computer you are running your job. The number of UM days you can run in this time will depend on the performance of the computer, the resolution of the UM, the physics options selected, the amount of STASH you are asking for etc.

Initial Run

Set up your UM job via the UMUI. You will need to specify

a) the run length: submodel independent → start date and run length options

Set the target run length to be the total time of your experiment.

b) the job resources for your runs: submodel independent → job resources

Set the job time limit to be the length of time needed to run a chunk of your experiment. Then press NEXT at the bottom of the window to set up automatic resubmission. Select automatic resubmission and specify the target run length and job time limit for the chunks.

c) the restart dump frequency: atmosphere → control → post processing dumping and meaning → dumping and meaning

Specify a restart dump frequency in days/hours/timesteps. This restart dump frequency must be chosen so that you have a restart dump at the end of each chunk of the run. So for example if you are running a 3 year experiment in 6 month chunks and you are using 360 day years then a suitable restart dump frequency would be 30 days. You may also need to worry about climate meaning which depends on the restart dump frequency (see the UM document about climate meaning). The standard UM scripts are set up so that you can automatically delete superseded restart dumps only when you have archiving switched on via submodel independent → post processing → main switch and general questions (see the document describing automatic archiving for the UM).

You can now submit this initial run in the normal way and when complete you should find only the first chunk has run, leaving some history files (*.*hist) and either all the restart dumps and the post processing files specified via STASH if automatic archiving has not been used or the last restart dump only if archiving is switched on and deletion of superseded dump files selected.

Continuation Run

When you process jobs using the UMUI it produces a set of UNIX scripts on the local computer where you run the UMUI. These scripts should be in the directory $HOME/umui_jobs/<jobid>. To create a continuation run:

  • edit the UM script called SUBMIT in this directory
  • change the line TYPE=NRUN to TYPE=CRUN
  • if your initial run was a compile and run job in one job then you need also to change to STEP=4 in the SUBMIT script as well.
  • save and exit
  • resubmit the UM run as you did for the initial run.

The run should continue from where the initial run finished and then each of the chunks should submit themselves automatically without further intervention.

NB

  • The maximum time limits for the standard queues on HECToR are 1, 3, 6 and 12 hours
  • Without automatic archiving disk space can fill up and cause the job to crash.