wiki:CDDS

Version 24 (modified by grenville, 14 months ago) (diff)

CMIP6 data CMOR'isation - How to run the CDDS work flow on JASMIN

1. JASMIN GWS access

You will need access to the cmip6_prep Group Workspace. Apply here:

https://accounts.jasmin.ac.uk/services/group_workspaces/cmip6_prep/

2. MOSRS access

You will need MOSRS access from JASMIN, see https://code.metoffice.gov.uk/trac/home/wiki/AuthenticationCaching

3. Where to run

The CDDS work flow comprises several phases

  1. preparation
  2. data reformatting
  3. quality check

Phase A runs interactively, is light on computational resource and can be run on jasmin-cylc. Phase B is computationally intensive, runs on LOTUS, and monitored from jasmin-cylc. Phase C runs interactively but should not be run on jasmin-cylc - one of the jasmin-sci machines will be appropriate.

Note: CDDS sources its own bespoke environment. To avoid potential conflicts, your environment should be as simple as possible - you may need to alter it to allow CDDS to run

4. Model Data

CDDS expects data to be in a JASMIN group workspace. CDDS supports two data directory structures:

  1. data by stream - data from MASS will be structured by stream (ap4, ap5, apm…)
  2. data by cycle - data from ARCHER or NEXCS will be structured by cylc cycle (18500101T0000Z, 18500111T0000Z, 18500401T0000Z, …)

5. Running CDDS

The CDDS work flow is driven by the json request file - which holds information about the MIP, the experiment, streams to be processed, start and end dates, the source model suite id, and more.

Where possible you should generate the json request file - you will need access to Met Office internal systems to do that. However, it might not difficult to modify an existing request file for your use - you will need knowledge of the MIP and the experiment for this.

Ocean data preparation

Several NEMO data sets need to be pre-processed to remove halos prior to CDDS processing. CMS have developed a Rose suite (u-bn255) to automate halo removal - the suite is available from the MOSRS suite repository . We suggest creating a fully halo-removed data set prior to running CDDS.

Example conversion process

There follows an example work flow, illustrative of the CDDS process. The example is specific to AerChemMIP for experiment piClim-NTCF - see https://rawgit.com/WCRP-CMIP/CMIP6_CVs/master/src/CMIP6_experiment_id.html for information about individual experiments.

  1. Create a top-level directory for the experiment - this will later become synonymous with $CDDS_DIR (I chose to name it AerchemMIP-piClim-NTCF in this case).
  1. Change directory to AerchemMIP-piClim-NTCF - all work should take place here
    cd AerchemMIP-piClim-NTC
    
  1. Copy the conversion process orchestration script cdds_workflow_for_user.sh from central-location to this directory
    cp /home/users/glister/CDDS/cdds_workflow_for_user.sh cdds_workflow_for_user.sh
    

Note: /home/users/glister/CDDS is for testing only — to be moved centrally

  1. Edit cdds_workflow_for_user.sh to set the environment variables CDDS_DIR and REQUEST_JSON
    export CDDS_DIR="<full-path>/AerchemMIP-piClim-NTCF"
    REQUEST_JSON=AerChemMIP-piClim-NTCF-req.json
    

Note: call the json file something memorable - there may be several in your workflow.

  1. Get a copy of CMIP6.cfg from central-location and lodge it in CDDS_DIRcentral-location under the CMIP6 directory as follows
    mkdir -p CMIP6/v1.0.5/general
    cp /home/users/glister/CDDS/CMIP6.cfg CMIP6/v1.0.5/general/CMIP6.cfg
    

Note: /home/users/glister/CDDS is for testing only — to be moved centrally

  1. Edit your newly copied CMIP6.cfg to set dataroot and procroot - in this particular example the edits are
    dataroot = /group_workspaces/jasmin4/ncas_cms/glister/AerchemMIP-piClim-NTCF/cdds_data
    procroot = /group_workspaces/jasmin4/ncas_cms/glister/AerchemMIP-piClim-NTCF/cdds_proc
    

This just says point dataroot and procroot to cdds_data and cdds_proc (respectively), which will be created later in CDDS_DIR. Note: you can not use the environment variable $CDDS_DIR in CMIP6.cfg.

  1. Create the json request file — here's the one used in our example. Much of the information listed here is taken directly from the rose-suite.info file for the UM model suite that generated the data (u-bh543 for our example)
    {
      "atmos_timestep": 1200,
      "branch_date_in_child": "1850-01-01-00-00-00",
      "branch_date_in_parent": "1850-01-01-00-00-00",
      "branch_method": "standard",
      "calendar": "360_day",
      "child_base_date": "1850-01-01-00-00-00",
      "config_version": "1.0.5",
      "experiment_id": "piClim-NTCF",
      "institution_id": "MOHC",
      "license": "CMIP6 model data produced by the Met Office Hadley Centre is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License (https://creativecommons.org/licenses). Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing CMIP6 output, including citation requirements and proper acknowledgment. Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file) and at https://ukesm.ac.uk/cmip6. The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law.\n\n ",
      "mip": "AerChemMIP",
      "mip_era": "CMIP6",
      "model_id": "UKESM1-0-LL",
      "model_type": "AGCM AER CHEM",
      "package": "round-1-monthly",
      "parent_base_date": "1850-01-01-00-00-00",
      "parent_mip": "CMIP",
      "parent_mip_era": "CMIP6",
      "parent_model_id": "UKESM1-0-LL",
      "parent_time_units": "days since 1850-01-01-00-00-00",
      "parent_variant_label": "r1i1p1f2",
      "request_id": "UKESM1-0-LL_piClim-NTCF_r1i1p1f2",
      "run_bounds": "1850-01-01-00-00-00 1895-01-01-00-00-00",
      "run_bounds_for_stream_ap4": "1850-01-01-00-00-00 1895-01-01-00-00-00",
      "run_bounds_for_stream_ap5": "1850-01-01-00-00-00 1895-01-01-00-00-00",
      "sub_experiment_id": "none",
      "suite_branch": "trunk",
      "suite_id": "u-bh543",
      "suite_revision": "115701",
      "variant_label": "r1i1p1f2"
    }
    
    
  1. Begin the CDDS process:
    source cdds_workflow_for_user.sh
    

Several directories will be created (their structure will be based on information in the json request file.) It is worth familiarizing yourself with the data structure and its relation to entries in the json request file.

cdds-data will hold spaces for input and output data

ls /group_workspaces/jasmin4/ncas_cms/glister/AerchemMIP-piClim-NTCF/cdds_data/CMIP6/AerChemMIP/UKESM1-0-LL/piClim-NTCF/r1i1p1f2/round-1-monthly
input/	output/

cdds_proc will contain various configuration files and logging output

ls /group_workspaces/jasmin4/ncas_cms/glister/AerchemMIP-piClim-NTCF/cdds_proc/CMIP6/AerChemMIP/UKESM1-0-LL_piClim-NTCF_r1i1p1f2/round-1-monthly
archive/  configure/  convert/	extract/  prepare/  qualitycheck/

Take a look in convert

ls convert
log/  u-ak283_JSON/

u-ak283-JSON is the Rose suite that will run to perform the data conversions.

  1. Tell CDDS where the input data resides by specifying its location (through soft links in this case) in the cdds_data directory. In our example, the raw data is in the aerchemmip group workspace
    cd /group_workspaces/jasmin4/ncas_cms/glister/AerchemMIP-piClim-NTCF/cdds_data/CMIP6/AerChemMIP/UKESM1-0-LL/piClim-NTCF/r1i1p1f2/round-1-monthly/input
    ln -s /gws/nopw/j04/aerchemmip_vol1/data/u-bh543_4archive u-bh543
    
  1. Enable the data conversion process - simply un-comment the final cdds_convert command in cdds_workflow_for_user.sh and source cdds_workflow_for_user.sh again. A small amount of work will be repeated, but much of the structure already configured (in particular cdds_data) will persist. [ Note: this should be handled more elegantly through arguments to cdds_workflow_for_user.sh ]

Several Rose suites will be created - monitor progress on jasmin-cylc with cylc-gscan. Suite logging is in cylc-run as usual.

  1. Converted data will be written to the output directory in cdds_data
    ls /group_workspaces/jasmin4/ncas_cms/glister/AerchemMIP-piClim-NTCF/cdds_data/CMIP6/AerChemMIP/UKESM1-0-LL/piClim-NTCF/r1i1p1f2/round-1-monthly/output
    ap4/  ap4_concat/  ap4_mip_convert/  ap5/  ap5_concat/	ap5_mip_convert/
    

6. Quality Control

7. Move to ESGF