The UM writes currently data in a format which is not most convenient nor desirable for many end users. Much effort is undertaken by the UM users to convert model output from the so-called fields-file format to CF-NetCDF, possibly via an intermediate, pp format. This conversion process is time consuming; for models generating large volumes of data, the time for conversion can be comparable to the model run time. In addition, the user can potentially be left with three data sets, the fields files, pp files and NetCDF files on various computer platforms. There is a strong case for modifying the UM source to enable direct output of CF-NetCDF. Time spent converting data formats is eliminated, data management is greatly simplified, and the creation and inclusion of metadata not possible through the format conversion route (by virtue of the fact that it is not present in the fields file but is available in the model) is opened up.
NCAS-CMS has considerable experience with software which does the format conversion externally to the model run. Jeff Cole authored xconv and convsh, which are widely used to convert fields files or pp files to NetCDF with support for the CF conventions. David Hassell is developing CF-Python which is a python implementation of the CF data model and can perform the pp to CF-NetCDF conversion. We have taken the next step and have begun working on modifications to the UM for direct CF-NetCDf output.
The project aims are
- Directly write UM STASH output in CF-NetCDF format
- Implement the entire UM STASH panel
- Convert PP-file metadata to CF metadata
- Support standard and mean PP files
- Support NetCDF versions 3 and 4, including data compression
- Support multiple versions of the UM
Considerable progress has been made and further effort is ongoing. Things to do include
- Finish implementing the CF attributes (cell methods, bounds)
- Further consider unlimited dimensions
- Complete STASH panel options (meaning, grid subsection, timeseries)
- Add support for mean PP files
- Put in code so PP output works when files are re-initialised and for CRUNs
- Add support for writing NetCDF data in I/O Server code
We recently described progress with this project at the UM User Workshop - please see the attached for details.
Spring 2014 Model ported to ARCHER
May 1 2014
- Implementation of the CF attributes (cell methods, bounds) - done
- STASH panel options meaning, grid subsection - done
- UMUI modified to select NetCDF options - done
- Support for NetCDF4 compression - done
The code is in the PUMA repository at https://puma.nerc.ac.uk/trac/UM/browser/UM/branches/dev/jeff/VN8.2_netcdf_stash.
Testing the scheme on the UKV model has revealed some issues which are currently under investigation. Work to finalize implementation of the to do list is ongoing.
Testing with several UM configurations is underway - xjwea is a global model running on ARCHER for test purposes.
Jan 20 2015
Progress continues with the implementation in UM 8.2. Support for file reinitialization and CRUNS has been added. Further work to ensure that the reinitialization scheme interacts correctly in conjunction with the IO servers has been undertaken. Code changes to ensure that data output through macros (makebc macro for example) in not output as CF netcdf are incorporated. The scheme has been integrated into the SWAMMA runs and testing (successful to date) is ongoing.
We have been in consultation with the MO UKESM team with the view to porting the code to UM 10.x, where it will be part of the trunk, and will serve as the basis for the CMIP6 output. Jeff Cole will be instrumental in making the code upgrade with close interaction with Jeremy Walton (MO UKESM technical lead)
The SWAMMA project has been running with UM netcdf fully incorporated with IO servers and has generated >100TB of 12km and 4km dust/convection data since the project's start.
Porting UM netcdf to UM 10.1 is ongoing. CMS will create a branch at UM10.1 — incorporating this into the UM trunk will be a task fort the MO to lead.
TODO — estimate the time saved by not converting data between formats (including time for data transfers)