wiki:CDDS

Version 16 (modified by grenville, 7 months ago) (diff)

CMIP6 data CMOR'isation - How to run the CDDS work flow on JASMIN

1. JASMIN GWS access

You will need access to the cmip6_prep Group Workspace. Apply here:

https://accounts.jasmin.ac.uk/services/group_workspaces/cmip6_prep/

2. MOSRS access

You will need MOSRS access from JASMIN, see https://code.metoffice.gov.uk/trac/home/wiki/AuthenticationCaching

3. Where to run

The CDDS work flow comprises several phases

  1. preparation
  2. data reformatting
  3. quality check

Phase A runs interactively, is light on computational resource and can be run on jasmin-cylc. Phase B is computationally intensive, runs on LOTUS, and monitored from jasmin-cylc. Phase C is potentially computationally intensive and should not be run on jasmin-cylc - one of the jasmin sci machines will be appropriate.

Note: CDDS sources its own bespoke environment. To avoid potential conflicts, your environment should be as simple as possible - you may need to alter it to allow CDDS to run

4. Model Data

CDDS expects data to be in a JASMIN group workspace. CDDS supports two data directory structures:

  1. data by stream - data from MASS will be structured by stream (ap4, ap5, apm…)
  2. data by cycle - data from ARCHER or NEXCS will be structured by cylc cycle (18500101T0000Z, 18500111T0000Z, 18500401T0000Z, …)

5. Running CDDS

The CDDS work flow is driven by the json request file - which holds information about the MIP, the experiment, streams to be processed, start and end dates, the source model suite id, and more.

Where possible you should generate the json request file - you will need access to Met Office internal systems to do that. However, it is not difficult to modify an existing request file for your use - you will need knowledge of the MIP and the experiment for this.

Ocean data preparation

NEMO output needs to be pre-processed to remove halos. CMS have developed a Rose suite to automate halo removal - the suite is available here (??). We suggest creating a fully halo-removed data set prior to running CDDS.

Example conversion process

There follows an example work flow, illustrative of the CDDS process. The MIP is AerChemMIP and the experiment is piClim-NTCF (note, case sensitive) - (see https://rawgit.com/WCRP-CMIP/CMIP6_CVs/master/src/CMIP6_experiment_id.html for information about individual experiments.)

  1. Create a top-level directory for the experiment (I chose AerchemMIP-piClim-NTCF in this case).
  2. Copy the conversion process orchestration script cdds_workflow_for_user.sh (from where?) to this directory
    cd AerchemMIP-piClim-NTC
    cp cdds_workflow_for_user.sh AerchemMIP-piClim-NTC/cdds_workflow_for_user.sh
    
  1. Edit cdds_workflow_for_user.sh to set CDDS_DIR and REQUEST_JSON
    export CDDS_DIR="<full-path>/AerchemMIP-piClim-NTCF"
    REQUEST_JSON=aerchemmip-req.json
    
  1. Get a copy of CMIP6.cfg from somewhere central and lodge it in CDDS_DIR under a new directory as follows
    mkdir -p CMIP6/v1.0.5/general
    cp <somewhere central>/CMIP6.cfg CMIP6/v1.0.5/general/CMIP6.cfg
    
  1. Edit your newly copied CMIP6.cfg to set dataroot and procroot - in this case the edits are
    dataroot = /group_workspaces/jasmin4/ncas_cms/glister/AerchemMIP-piClim-NTCF/cdds_data
    procroot = /group_workspaces/jasmin4/ncas_cms/glister/AerchemMIP-piClim-NTCF/cdds_proc
    

This just says point dataroot and procroot to cdds_data and cdds_proc (respectively), which will be created later in CDDS_DIR.

  1. Create the json request file —-more guidance?
  2. Begin the CDDS process:
    source cdds_workflow_for_user.sh
    

Several directories will be created (their structure will be based on information in the json request file.)

A variable list will be generated in cdds-proc

A set of configuration files will be generated.

  1. Tell CDDS where the input data resides by specifying its location in the cdds_data directory. In this case, the raw data is in the aerchemmip group workspace
    cd /group_workspaces/jasmin4/ncas_cms/glister/AerchemMIP-piClim-NTCF/cdds_data/CMIP6/AerChemMIP/UKESM1-0-LL/piClim-NTCF/r1i1p1f2/round-1-monthly/input
    ln -s /gws/nopw/j04/aerchemmip_vol1/data/u-bh543_4archive u-bh543
    
  1. Enable the data conversion process - simply un-comment the final cdds_convert command in cdds_workflow_for_user.sh and source cdds_workflow_for_user.sh again. A small amount of work will be repeated, but much of the structure already configured (in particular cdds_data) will persist.

Several Rose suites will be created - monitor progress on jasmin-cylc with cylc-gscan. Site logging is in cylc-run as usual.

  1. Converted data is in cdds_data
    /group_workspaces/jasmin4/ncas_cms/glister/AerchemMIP-piClim-NTCF/cdds_data/CMIP6/AerChemMIP/UKESM1-0-LL/piClim-NTCF/r1i1p1f2/round-1-monthly/output