wiki:CDDS

Version 33 (modified by grenville, 2 weeks ago) (diff)

CMIP6 data CMOR'isation - How to run the CDDS work flow on JASMIN

1. JASMIN GWS access

You will need access to the cmip6_prep Group Workspace. Apply here:

https://accounts.jasmin.ac.uk/services/group_workspaces/cmip6_prep/

2. MOSRS access

You will need MOSRS access from JASMIN, see https://code.metoffice.gov.uk/trac/home/wiki/AuthenticationCaching

3. Where to run

The CDDS work flow comprises several phases

preparation

data reformatting

quality check

The preparation phase runs interactively, is light on computational resource and can be run on jasmin-cylc. Data reformatting is computationally intensive, runs on LOTUS, and monitored from jasmin-cylc. The quality runs interactively but should not be run on jasmin-cylc - one of the jasmin-sci machines will be appropriate.

Note: CDDS sources its own bespoke environment. To avoid potential conflicts, your environment should be as simple as possible - you may need to alter it to allow CDDS to run

4. Model Data

CDDS expects data to be in a JASMIN group workspace. CDDS supports two data directory structures:

  1. data by stream - data from MASS will be structured by stream (ap4, ap5, apm…)
  2. data by cycle - data from ARCHER or NEXCS will be structured by cylc cycle (18500101T0000Z, 18500111T0000Z, 18500401T0000Z, …)

5. Running CDDS

The CDDS work flow is driven by the json request file - which holds information about the MIP, the experiment, streams to be processed, start and end dates, the source model suite id, and more.

Where possible you should generate the json request file - you will need access to Met Office internal systems to do that. However, it might not difficult to modify an existing request file for your use - you will need knowledge of the MIP and the experiment for this. An example file is shown below.

Ocean data preparation

Several NEMO data sets need to be pre-processed to remove halos prior to CDDS processing. CMS have developed a Rose suite (u-bn582 - see http://cms.ncas.ac.uk/wiki/CDDS/halo) to automate halo removal - the suite is available from the MOSRS suite repository . We suggest creating a fully halo-removed data set prior to running CDDS.

Example conversion process

There follows an example work flow, illustrative of the CDDS process. The example is specific to AerChemMIP for experiment piClim-NTCF - see https://rawgit.com/WCRP-CMIP/CMIP6_CVs/master/src/CMIP6_experiment_id.html for information about individual experiments.

  1. Under /group_workspaces, create a top-level directory for the experiment - this will later become synonymous with $CDDS_DIR (I chose to name it AerChemMIP-piClim-NTCF in this case).
  1. Change directory to AerChemMIP-piClim-NTCF - all work should take place here
    cd AerChemMIP-piClim-NTCF
    
  1. Copy the conversion process orchestration script cdds_workflow_for_user.sh from central-location to this directory
    cp /home/users/glister/jasmin-cdds/operational_scripts/cdds_workflow_for_user_v121.sh cdds_workflow_for_user.sh
    

Note: /home/users/glister/jasmin-cdds/operational_scripts is for testing only — to be moved centrally

  1. Edit cdds_workflow_for_user.sh to set the environment variables CDDS_DIR, REQUEST_JSON and FILEPATHSTYPE
    export CDDS_DIR="<full-path>/AerChemMIP-piClim-NTCF"
    REQUEST_JSON=AerChemMIP-piClim-NTCF-req.json
    export FILEPATHSTYPE="ARCHER"
    

Note: call the REQUEST_JSON file something memorable - there may be several in your workflow.

  1. Create the json request file, naming the same as $REQUEST_JSON set in 4. — here's the one used in our example. Much of the information listed here is taken directly from the rose-suite.info file for the UM model suite that generated the data (u-bh543 for our example)
    {
      "atmos_timestep": 1200,
      "branch_date_in_child": "1850-01-01-00-00-00",
      "branch_date_in_parent": "1850-01-01-00-00-00",
      "branch_method": "standard",
      "calendar": "360_day",
      "child_base_date": "1850-01-01-00-00-00",
      "config_version": "1.0.5",
      "experiment_id": "piClim-NTCF",
      "institution_id": "MOHC",
      "license": "CMIP6 model data produced by the Met Office Hadley Centre is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License (https://creativecommons.org/licenses). Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing CMIP6 output, including citation requirements and proper acknowledgement. Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file) and at https://ukesm.ac.uk/cmip6. The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law.\n\n ",
      "mip": "AerChemMIP",
      "mip_era": "CMIP6",
      "model_id": "UKESM1-0-LL",
      "model_type": "AGCM AER CHEM",
      "package": "round-1-monthly",
      "parent_base_date": "1850-01-01-00-00-00",
      "parent_mip": "CMIP",
      "parent_mip_era": "CMIP6",
      "parent_model_id": "UKESM1-0-LL",
      "parent_time_units": "days since 1850-01-01-00-00-00",
      "parent_variant_label": "r1i1p1f2",
      "request_id": "UKESM1-0-LL_piClim-NTCF_r1i1p1f2",
      "run_bounds": "1850-01-01-00-00-00 1895-01-01-00-00-00",
      "run_bounds_for_stream_ap4": "1850-01-01-00-00-00 1895-01-01-00-00-00",
      "run_bounds_for_stream_ap5": "1850-01-01-00-00-00 1895-01-01-00-00-00",
      "sub_experiment_id": "none",
      "suite_branch": "trunk",
      "suite_id": "u-bh543",
      "suite_revision": "115701",
      "variant_label": "r1i1p1f2"
    }
    
    
  1. Begin the CDDS process:
    source cdds_workflow_for_user.sh
    

Several directories will be created (their structure will be based on information in the json request file.) It is worth familiarizing yourself with the data structure and its relation to entries in the json request file.

cdds_data will hold spaces for input and output data

ls $CDDS_DIR/cdds_data/CMIP6/AerChemMIP/UKESM1-0-LL/piClim-NTCF/r1i1p1f2/round-1-monthly
input/	output/

cdds_proc will contain various configuration files and logging output

ls $CDDS_DIR/cdds_proc/CMIP6/AerChemMIP/UKESM1-0-LL_piClim-NTCF_r1i1p1f2/round-1-monthly
archive/  configure/  convert/	extract/  prepare/  qualitycheck/
  1. Tell CDDS where the input data resides by specifying its location (through soft links in this case) in the cdds_data directory. In our example, the raw data is in the aerchemmip group workspace
    cd $CDDS_DIR/cdds_data/CMIP6/AerChemMIP/UKESM1-0-LL/piClim-NTCF/r1i1p1f2/round-1-monthly/input
    ln -s /gws/nopw/j04/aerchemmip_vol1/data/u-bh543_4archive u-bh543
    
  1. Enable the data conversion process - simply un-comment the final cdds_convert command in cdds_workflow_for_user.sh. A small amount of work will be repeated, but much of the structure already configured (in particular cdds_data) will persist. [ Note: this should be handled more elegantly through arguments to cdds_workflow_for_user.sh ]

Several Rose suites will be created - monitor progress on jasmin-cylc with cylc gscan. Suite logging is in cylc-run as usual.

Take a look in convert

ls convert
log/  u-ak283_JSON/

u-ak283-JSON is the Rose suite that will run to perform the data conversions.

  1. Converted data will be written to the output directory in cdds_data
    ls $CDDS_DIR/cdds_data/CMIP6/AerChemMIP/UKESM1-0-LL/piClim-NTCF/r1i1p1f2/round-1-monthly/output
    ap4/  ap4_concat/  ap4_mip_convert/  ap5/  ap5_concat/	ap5_mip_convert/
    

6. Quality Control

Login to one of the jasmin-sci machines. Change directory to CDDS_DIR

Set up the CDDS environment -

source /gws/smf/j04/cmip6_prep/cdds-env/setup_cdds_env.sh

Run the quality check utility

qc_run_and_report ./FAFMIP-faf-heat.json -p -c ./

Review the quality check output under cdds_proc

ls $CDDS_DIR/cdds_proc/CMIP6/FAFMIP/HadGEM3-GC31-LL_faf-heat_r1i1p1f1/round-1-monthly/qualitycheck
approved_variables_2019-10-08T113517.txt  log/	qc.db  report_2019-10-08T113517.json

7. Move CMOR data to ESGF