wiki:UM/Configurations/HectorHadgem3aoR40

Version 2 (modified by annette, 6 years ago) (diff)

HadGEM3-AO r4.0 (xfuzb)

Job overview

This job is the coupled HadGEM3-AO release 4.0 from the Met Office. It uses vn7.6 of the UM with vn3.0 NEMO and vn4.1 CICE. The atomsphere resolution is N96 L85 and NEMO is ORCA1 L75.

This job is based on Met Office job id akapa (equivalent to job ajtzd). More information on the HadGEM3 coupled model development can be found on the collaboration wiki:

This version is set up for the Cray XE6 (HECToR phase 3).

Getting started

You will need to be registered for puma and have access to the UM and NEMO-CICE code repositories.

Using the UMUI, take a copy of the job configuration xfuzb. Before submitting the job it is essential that you change some basic settings, by going to the following UMUI windows:

  1. User Information and Submit Method → General Details

Set your Userid, Email Address and Tic Code

  1. FCM Configuration → FCM Extract directories and Output levels

Set an appropriate value for "Target machine root extract dir (UM_ROUTDIR)".
This is a directory on HECToR, typically /home/n02/n02/<userid>

  1. Input/Output Control and Resources → Time Convention and SCRIPT Environment Variables

Check that the settings for $DATAW and $DATAM are as you require.

Save, Process and then Submit your job.

The job is set up to run on a total of 160 cores (5 nodes) with 96 cores for the atmosphere (8x12), 32 cores for the ocean (4x8 for NEMO and 32x1 for CICE), and a full node (32 cores) for OASIS.

Known issues

Recent fix included in the standard job

  1. NetCDF error when writing CICE history files:
    NetCDF: Numeric conversion not representable
    

This is an issue with the NetCDF libraries for CCE/8 and is fixed at NetCDF/4.2.0, linked with the hand-edit:

~umui/hadgem3/vn7.6/HG3AO40/hand_edits/load_netcdf4.2.0.ed

Editing jobs

The NEMO and CICE submodels are controlled separately to the atmosphere model.

A small number of changes can be made through the UMUI - in particular the model start date, run length and resubmission length are applied to all submodels. The number of processors and domain decompositions for each submodel are also controlled through the UMUI under User Information and Submit Method → General details.

Note that unlike the UM atmosphere model the ocean model must be recompiled to alter the decomposition of NEMO or CICE.

All other changes including dump and diagnostic output frequency need to be made in the submodel control files.

  • FPP keys configuration file: FCM Configuration → FCM Options for NEMO / CICE

This specifices the code sections to include. See the submodel documentations for more information

  • Control namelist file: NEMO / CICE → Scientific Parameters and Sections → Links to NEMO / CICE model

This includes the output file frequency, control of diagnostics and the values of other scientific parameters.

More information is available on the NEMO-CICE trac wiki.

Limitations

On the Cray hardware each submodel (atmos, ocean and coupler) need to be run a separate set of nodes - thus a minimum of 3 is required.

Currently each submodel needs to run with the same number of cores per node (default 32).

Archiving of NEMO and CICE files is not activated in the standard job.

Contact the helpdesk with any other difficulties to do with running the coupled model.

Timing runs

Some different atmosphere and ocean decompositions were tested. The times shown are from the "Maximum elapsed wallclock time" from the UM timer. The model was run for 3 days with daily atmosphere, ocean and sea-ice dumps.

Nodes  Total pes   = Atmos cfg   + NEMO CICE cfg   + OASIS -> Time (s)
 
4      128           64 (8x8)      32 (4x8 - 32x1)   32       1112.834
5      160           96 (8x12)     32 (4x8 - 32x1)   32        871.852
6      192           128 (8x16)    32 (4x8 - 32x1)   32        777.669

The dumps after 3 days were found to be identical in all cases.

When altering the ocean decomposition however the dumps differed even with bit-reproducible options in the NEMO namelist file (nbit_cmp=1 and nsolv=2). Under the pervious hardware (Phase 2b) and pathscale compiler the ocean model did bit-reproduce.

Porting notes

Job changes

A summary of the changes made to the Met Office job to run on the XE6 system. See also the list of branches below.

  • HECToR details:
    User and machine details including username, tic-code, machine name and job submission information ('qsub' for PBS pro)
  • Atmos to ocean pe decomposition:
    This was modified to to 8x12 for the atmosphere, 4x8 for NEMO and 32x1 for CICE. NEMO and CICE are compiled into a single executable and run sequentially. This adds up tp 160 cores (5 nodes) including 1 node (32 cores) for OASIS. This was found to be the best decomposition of those tested.
  • Job directories:
    HECToR output directory set to $DATADIR/um/$RUNID and puma extract directory set to /work/n02/n02/username/um.
  • Path to local umui files:
    /home/umui/hadgem3/vn7.6/HG3AO40/. This contains hand edits (hand_edits/), compile overrides (overrides/), user STASH master files (preSTASHmaster/), coupling macros (macros/), NEMO configuration files (nemo_cfg/) and CICE configuration files (cice_cfg/).
  • Extra hand-edits for puma:
    Currently a hand-edit is required for submission of the coupled model to HECToR (vn7.5_oasis_nproc.ed) and for archiving (archiving_7.6).
  • Location of input data files on HECToR:
    $UMDIR/vn7.6/HG3AO40. This contains start dumps, ancillary and forcing files plus NEMO and CICE control namelists (nemo_ctl/ and cice_ctl/).
  • Byte swap CICE binary restart file:
    iced_start_abwORCA1_sep_swapped.bin
  • Location of the OASIS build on HECToR:
    /work/n02/n02/hum/oasis/oasis3_2-5/prism/crayxe6_cce.
  • Coupling macro:
    Point to version for HECToR (uses &END rather than / in namelists): cpl_macro_hadgem3_3hr.
  • FCM settings for puma:
    ] Fill in "container file name and location", "bindings location" and "subversion URL" for puma.
  • Puma versions of branches:
    Replace Met Office versions of branches with puma equivalents (see branches on puma below.)
  • Include extra atmos branches:
    For running on HECToR fcm:um_br/pkg/Config/VN7.6_ncas/src and fcm:um_br/dev/jeff/VN7.6_hector_monsoon_archiving/src.
  • Include extra NEMO branches:
    These contain fixes for HECToR. fcm:nemo_br/dev/annette/VN3.0_fixes_to_nemo_trunk/NEMO and fcm:nemo_br/dev/annette/VN3.0_coupled_fixes/NEMO
  • Include extra CICE branches:
    Fix for HECToR. fcm:cice_br/dev/annette/VN4.1_dbl_notation_fix/cice
  • Compile options for NEMO and CICE:
    Configuration files to extract NEMO v3.0 and CICE v4.1 from puma repository and set compiler flags for pathscale on HECToR: nemo_XE6_cce_3.0_base.cfg (NEMO) and cice4.1_base_XE6_cce.cfg (CICE).
  • Linking to OASIS libraries on HECToR:
    Update NEMO library flags set in the UMUI for HECToR (remove netcdf and IBM-specific options). Atmosphere options are specified in two compile override files: oasis_file_hector_cce_7.6 and oasis_mach_hector_cce_7.6.
  • Tidying up: IBM-specific environment variables were removed, FCM output was reduced from '3' to '1', output prints were reduced from "operational" to "normal", outputting of basis files was turned off, and pe output files were set to be deleted on successful completion. Switched off user-script releases.

Branches on puma

A list of the branches included in this job and the equivalents on puma.

UM 
Met Office branch                                      Rev     Puma branch                                                       Rev

fcm:um_br/dev/frme/VN7.6_rhcrit_para_bugfix/src        22337   fcm:um_br/dev/matthew_miz/VN7.6_rhcrit_para_bugfix_ukmo/src       4036
fcm:um_br/dev/frrh/VN7.6_coupling_comp_opts/src        22218   fcm:um_br/dev/annette/VN7.6_coupling_comp_opts_ukmo/src           4824
fcm:um_br/dev/hadci/VN7.6_hadgem3_specials/src         22391   fcm:um_br/dev/annette/VN7.6_hadgem3_specials_ukmo/src             4826
fcm:um_br/dev/hadci/VN7.6_restart_fix/src              22275   fcm:um_br/dev/annette/VN7.6_restart_fix_ukmo/src                  4828
fcm:um_br/dev/hadco/VN7.6_incrCLO/src                  22233   fcm:um_br/dev/matthew_miz/VN7.6_incrCLO_ukmo/src                  4032
fcm:um_br/dev/hadco/VN7.6_reinstate_ISCCP/src          22237   fcm:um_br/dev/matthew_miz/VN7.6_reinstate_ISCCP_ukmo/src          4030
fcm:um_br/dev/hadke/VN7.6_rcf_polaravg_landfields/src  22292   fcm:um_br/dev/matthew_miz/VN7.6_rcf_polaravg_landfields_ukmo/src  4034
fcm:um_br/dev/frid/VN7.6_baresoilbugfix/src            22455   fcm:um_br/dev/matthew_miz/VN7.6_baresoilbugfix_ukmo/src           4038
fcm:um_br/dev/hadci/VN7.6_topmelt_stash_fix/src        22730   fcm:um_br/dev/annette/VN7.6_topmelt_stash_fix_ukmo/src            4831
fcm:um_br/dev/hadco/VN7.6_dust_tuning/src              23270   fcm:um_br/dev/matthew_miz/VN7.6_dust_tuning_ukmo/src              4040
fcm:um_br/dev/hadaw/VN7.6_inland_basins/src            22972   fcm:um_br/dev/matthew_miz/VN7.6_inland_basins_ukmo/src            4042
                                                               fcm:um_br/pkg/Config/VN7.6_ncas/src
                                                               fcm:um_br/dev/jeff/VN7.6_hector_monsoon_archiving/src

NEMO 
Met Office branch                                      Rev     Puma branch                                                       Rev

fcm:ioipsl_br/dev/hadci/VN3.0_CF_comp                  2213    fcm:ioipsl_br/dev/Share/VN3.0_CF_comp_ukmo                        532
fcm:ioipsl_br/dev/hadci/VN3.0_defprec                  2060    fcm:ioipsl_br/dev/Share/VN3.0_defprec                             546
fcm:nemo_br/dev/hadci/VN3.0_18Cisotherm/NEMO           2214    fcm:nemo_br/dev/Share/VN3.0_18Cisotherm_ukmo/NEMO                 559
fcm:nemo_br/dev/hadci/VN3.0_CF_comp/NEMO               3316    fcm:nemo_br/dev/Share/VN3.0_CF_comp_ukmo/NEMO                     1638
fcm:nemo_br/dev/hadci/VN3.0_PEchange/NEMO              2057    
fcm:nemo_br/dev/hadci/VN3.0_diaptr_new/NEMO            3174    fcm:nemo_br/dev/annette/VN3.0_diaptr_new_ukmo/NEMO                973
fcm:nemo_br/dev/hadci/VN3.0_hadgem3/NEMO               3716    fcm:nemo_br/dev/Share/VN3.0_hadgem3_ukmo/NEMO                     1222
fcm:nemo_br/dev/hadci/VN3.0_karamld/NEMO               2056    fcm:nemo_br/dev/Share/VN3.0_karamld_ukmo/NEMO                     556
fcm:nemo_br/dev/hadci/VN3.0_restart_date/NEMO          2624    fcm:nemo_br/dev/annette/VN3.0_restart_date_ukmo/NEMO              733
fcm:nemo_br/dev/hadom/VN3.0_ORCA1_L75/NEMO             3732    fcm:nemo_br/dev/annette/VN3.0_ORCA1_L75_ukmo/NEMO                 1646
fcm:nemo_br/dev/hadom/VN3.0_ORCAL75_10m_mindepth/NEMO  3077    fcm:nemo_br/dev/matthew_miz/VN3.0_ORCAL75_10m_mindepth_ukmo/NEMO  1059
fcm:nemo_br/dev/hadci/VN3.0_avt_rnf_fix/NEMO           3347    fcm:nemo_br/dev/malcolm/VN3.0_avt_rnf_fix_ukmo/NEMO               1126
fcm:nemo_br/dev/hadci/VN3.0_tvd_diaptr_fix/NEMO        3285    fcm:nemo_br/dev/malcolm/VN3.0_tvd_diaptr_fix_ukmo/NEMO            1124
                                                               fcm:nemo_br/dev/annette/VN3.0_fixes_to_nemo_trunk/NEMO            547
                                                               fcm:nemo_br/dev/annette/VN3.0_coupled_fixes/NEMO                  957
(Note: VN3.0_PEchange is included in VN3.0_fixes_to_nemo_trunk)

CICE 
Met Office branch                                      Rev     Puma branch                                                       Rev

fcm:cice_br/dev/Share/VN4.1_HadCICERun/cice            329     fcm:cice_br/dev/charris/VN4.1_HadCICERun_ukmo/cice                1306
fcm:cice_br/dev/hadci/VN4.1_no_hbnew_errors/cice       318     fcm:cice_br/dev/charris/VN4.1_no_hbnew_errors_ukmo/cice           1312
fcm:cice_br/dev/hadci/VN4.1_no_vert_check/cice         317     fcm:cice_br/dev/charris/VN4.1_no_vert_check_ukmo/cice             1314
                                                               fcm:cice_br/dev/annette/VN4.1_dbl_notation_fix/cice               1803