Version 11 (modified by grenville, 13 months ago) (diff)

CMIP6 data CMOR'isation - How to run the CDDS work flow on JASMIN

1. JASMIN GWS access

You will need access to the cmip6_prep Group Workspace. Apply here:

2. MOSRS access

You will need MOSRS access from JASMIN, see

3. Where to run

The CDDS work flow comprises several phases

  1. preparation
  2. data reformatting
  3. quality check

Phase A runs interactively, is light on computational resource and can be run on jasmin-cylc. Phase B is computationally intensive, runs on LOTUS, and monitored from jasmin-cylc. Phase C is potentially computationally intensive and should not be run on jasmin-cylc - one of the jasmin sci machines will be appropriate.

Note: CDDS sources its own bespoke environment. To avoid potential conflicts, your environment should be as simple as possible - you may need to alter it to ensire

4. Model Data

CDDS expects data to be in a JASMIN group workspace. CDDS supports two data directory structures:

  1. data by stream - data from MASS will be structured by stream (ap4, ap5, apm…)
  2. data bu cycle - data from ARCHER or NEXCS will be structured by cylc cycle (18500101T0000Z, 18500111T0000Z, 18500401T0000Z, …)

5. Running CDDS

The CDDS work flow is driven by the json request file - which holds information about the MIP, the experiment, streams to be processed, start and end dates, the source model suite id, and more.

Where possible you should generate the json request file - you will need access to Met Office internal systems to do that. However, it is not difficult to modify an existing request file for your use - you will need knowledge of the MIP and the experiment for this.

Ocean data preparation

NEMO output needs to be pre-processed to remove halos. CMS have developed a Rose suite to automate halo removal - the suite is available here (??). We suggest creating a fully halo-removed data set prior to running CDDS.

Example conversion process

There follows an example work flow, illustrative of the CDDS process. The MIP is AerChemMIP and the experiment is piClim-NTCF (note, case sensitive) - (see for information about individual experiments.)

  1. Create a top-level directory for the experiment (I chose AerchemMIP-piClim-NTCF in this case).
  2. Copy (git?) the conversion process orchestration script (from where?) to this directory
    cd AerchemMIP-piClim-NTC
    cp AerchemMIP-piClim-NTC/
  1. Edit to set CDDS_DIR and REQUEST_JSON
    export CDDS_DIR="<full-path>/AerchemMIP-piClim-NTCF"
  1. Get a copy of CMIP6.cfg from somewhere central and lodge it in CDDS_DIR under a new directory as follows
    mkdir -p CMIP6/v1.0.5/general
    cp <somewhere central>/CMIP6.cfg CMIP6/v1.0.5/general/CMIP6.cfg
  1. Edit your newly copied CMIP6.cfg to set dataroot and procroot - in this case the edits are
    dataroot = /group_workspaces/jasmin4/ncas_cms/glister/AerchemMIP-piClim-NTCF/cdds_data
    procroot = /group_workspaces/jasmin4/ncas_cms/glister/AerchemMIP-piClim-NTCF/cdds_proc

This just says point dataroot and procroot to cdds_data and cdds_proc (respectively), which will be created later in CDDS_DIR.

  1. Create the json request file
  2. Begin the CDDS process:

Several directories will be created (their structure will be based on information in the json request file.)

A variable list will be generated in cdds-proc

A set of configuration files will be generated.

  1. Tell CDDS where the input data resides by specifying its location in the cdds_data directory. In this case, the raw data is in the aerchemmip group workspace
    cd /group_workspaces/jasmin4/ncas_cms/glister/AerchemMIP-piClim-NTCF/cdds_data/CMIP6/AerChemMIP/UKESM1-0-LL/piClim-NTCF/r1i1p1f2/round-1-monthly/in/gws/nopw/j04/input
    ln -s /gws/nopw/j04/aerchemmip_vol1/data/u-bh543_4archive u-bh543
  1. Allow the data conversion process - simply un-comment the final cdds_convert command in and source again. A small amount of work will be repeated, but the structure already set up will persist.