Lustre pruning in the SWAMMA models

The SWAMMA models can generate very large amounts of data while they are running. For example the 12km model generates 4.6 TB while the 4km models generate 52 TB (for 153 day runs). Generally, the amount of work space on the Lustre (/work) file system is limited, so there is a danger that the model will crash when it runs out of space. I only had 24 TB available and when two 4km models were running in parallel, the risks were great.

Model Data Vol(TB) CRUN (days) Rate (Model days/day)
12km 4.6 30 28
4km 52 10 9.2

This problem was averted by archiving to the ARCHER Research Data Facility (RDF) whilst the model was running and to have a separate process transferring data from the RDF to the JASMIN archive. After data was archived it was deleted from the Lustre disc. However, it is essential to leave a number of complete CRUNs on the disc so that the model could continue to run, and in the event of a disaster, the model could be restarted from the end of a previous CRUN without recovery data transfers.

Three scripts were involved,

  • manual_prune,
  • prune_workdir,
  • the 4km bottom script, lbc_update_4km_v2.scr.

The first two were manual processes that operated asynchronously from the model. The last was executed at the end of each CRUN in the 4km model only, and is essentially the first two hooked together.

The manual prune script took the RUNID and examined the data directory $DATADIR/um/$RUNID to determine the first and last netCDF files currently present. This used the time stamp on the file. This led to files with names like $DATADIR/um/xkztb/ . These were decoded to extract the date portion. The manual prune script then submitted the prune_workdir job to the serial queue which did the pruning.

The pruning script first backs up the Lustre file to the RDF. If this fails, due to running out of time for example, no pruning is done.

The pruning script calculates the number of days of data available and compares it with the number of days to keep. For the 4km runs, two lots of CRUNs (each 10 days) were kept.

If there is enough to prune, then we need to calculate the end date of the prune. This makes use of the date manipulation capabilities of the date command,

end_prune=$(date -d "$last -$daystokeep days" "+%Y%m%d")

The data is then deleted a day at a time from first day to the end_prune day.

Discussion on Scaling

Two processes are competing. The model is creating data at a certain rate and the archiving process works at a different rate. The production process can be estimated as follows.

For the 12km model, one CRUN is completed in 30/28 days = 26 hrs. But this is (30/153)x4.6 = 0.9TB, so the production rate is 29hrs/TB.

For the 4km model one CRUN completed in 10/9.2 days = 26 hrs. But this is (10/153)x52 = 3.4TB, so the production rate is 26/3.4 = 7.6 hours/TB.

Since the archival rate from Lustre to the RDF is much faster, at 3 hours/TB, it can keep up with the production of one CRUN. With the 4km model it was just feasible to keep two CRUNs.


No comments.