NCAS Computational Modelling Services

ARCHER2 O/S Upgrade

Updating a UM suite after the ARCHER2 O/S upgrade

This page will be continually updated.

In May/June 2023, Archer2 underwent a major software upgrade. All the system software, including compilers and libraries were updated, and as a result, we have had to rebuild much of the UM supporting software. This means users need to make some changes to their suites and test their workflows before resuming work.

We have ported and tested some of the commonly-used suites, listed below. And we have provided a guide to the suite changes required. As ever, since UM suites can be set up in so many different ways, we can not provide a comprehensive set of instructions. Please get in touch with the helpdesk if you run into diffculties.

Instructions

The following instructions draw on the naming style typically found in climate suites, but the ideas should apply to all UM suites. Suite modifications derive to accommodate changes to the slurm job scheduler. Minimal user-level changes are required and suites should run successfully. We stress that the UM executables for the reconfigurarion and the atmosphere model must be rebuilt.

Atmosphere Suites

For atmosphere-only suites add the --cpus-per-task={{MAIN_OMPTHR_ATM}} clause to the atmopshere resources [[[environment]]] section in archer2.rc. For example:

[[ATMOS_RESOURCE]]
    ...
    [[[environment]]]
        OMP_NUM_THREADS={{MAIN_OMPTHR_ATM}}
        ROSE_LAUNCHER_PREOPTS = {{ATM_SLURM_FLAGS}} --cpus-per-task={{MAIN_OMPTHR_ATM}}

Coupled Suites

Coupled atmosphere-ocean suites require changes to suite files rose-suite.conf and archer2.rc.

  1. In rose-suite.conf change the Science Configuration Module (see the table below for the required mappings)
Pre OS upgrade MOCI modulePost OS upgrade MOCI module
GC3-PrgEnv/2.0/2021.12.15GC3-PrgEnv/v1
GC3-PrgEnv/2.0/2021.11.22GC3-PrgEnv/v2
GC3-PrgEnv/2.0/2022.12.09GC3-PrgEnv/v3
GC4-PrgEnv/2021.12.1GC4-PrgEnv/v1
GC5-PrgEnv/2023.01.1GC5-PrgEnv/v1
  1. Update the cce module version and remove the ucx module swap entries in archer2.rc, ie, change
     module load cce/12.0.0
     module swap craype-network-ofi craype-network-ucx
     module swap cray-mpich cray-mpich-ucx/8.1.15
    {{MODULE_CMD}}

to

    module load cce/15.0.0
    {{MODULE_CMD}}
  1. Update the ROSE_LAUNCHER_PREOPTS for the UM, NEMO, and XIOS.
    There is no longer any need to distinguish between single and multithreaded cases, but note the clauses --hint=nomultithread --distribution=block:block must appear in the ROSE_LAUNCHER_PREOPTS for the UM, NEMO, and XIOS
       [[UM_RESOURCE]]
           [[[environment]]]
                  ROSE_LAUNCHER_PREOPTS_UM  = --het-group=0 --nodes={{ATMOS_NODES}} --ntasks={{ATMOS_TASKS}} --tasks-per-node={{ATMOS_PPNU*NUMA}} --cpus-per-task={{OMPTHR_ATM}} --hint=nomultithread --distribution=block:block --export=all,OMP_NUM_THREADS={{OMPTHR_ATM}},HYPERTHREADS={{HYPERTHREADS}},OMP_PLACES=cores
    
       [[NEMO_RESOURCE]]
             [[[environment]]]
                  ROSE_LAUNCHER_PREOPTS_NEMO  = --het-group=1 --nodes={{OCEAN_NODES}} --ntasks={{OCEAN_TASKS}} --tasks-per-node={{OCEAN_PPNU*NUMA}} --cpus-per-task={{OMPTHR_OCN}} --hint=nomultithread --distribution=block:block --export=all,OMP_NUM_THREADS={{OMPTHR_OCN}},HYPERTHREADS={{HYPERTHREADS}},OMP_PLACES=cores
    
                  {% if XIOS_NPROC is defined and XIOS_NPROC > 0 %}
                  ROSE_LAUNCHER_PREOPTS_XIOS  = --het-group=2 --nodes={{XIOS_NODES}} --ntasks={{XIOS_TASKS}} --tasks-per-node={{XIOS_PPNU*NUMA}} --cpus-per-task=1 --hint=nomultithread --distribution=block:block --export=all,OMP_NUM_THREADS=1,HYPERTHREADS=1
                  {% endif %}

Ported suites

The following suites have been updated and tested following the OS Upgrade.

UM versionSuite idDescriptionBranches + Note
11.1u-be303/archer2UKESM1.0 AMIP 
11.2u-bc613/archer2UKESM1.0 Historicalsee changes to the hetjob config in site/archer2.rc
11.2u-bc964/archer2UKESM1.0 pre-industrial controlsee changes to the hetjob config in site/archer2.rc
11.6u-bs251/archer2GA7.0 N96 AMIP Climate Development 
11.7u-ca634GA8.0GL9.0 AMIP Climate Development 
12.2u-cm785/archer2GC4 N96 ORCA025 
13.2u-cy010GC5 N216 ORCA025 

How to restart suites

Note - only follow this section if the fcm_make* tasks do not appear in the Cylc gui for the last active cycle

Suites that were running at the time ARCHER2 went down need to have their fcm_make* tasks re-inserted and re-run in order to rebuild the module executables.

  • After making the above changes, restart the suite in a held state:

    puma$ rose suite-run --restart -- --hold

  • Identify the cycle-point of the last active cycle (e.g. 19990401T0000Z)

  • For atmos only suites

    archer2$ cd <your-suite>/share
    archer2$ mv fcm_make_um fcm_make_um_preupgrade

    Insert the build tasks by running:

    puma$ cylc insert --no-check SUITE-ID fcm_make_um.CYCLE-POINT
    puma$ cylc insert --no-check SUITE-ID fcm_make2_um.CYCLE-POINT

    For example:
    cylc insert --no-check u-cm123 fcm_make_um.19990401T0000Z

  • For coupled suites

    archer2$ cd <your-suite>/share
    archer2$ mv fcm_make_um fcm_make_um_preupgrade
    archer2$ mv fcm_make_ocean fcm_make_ocean_preupgrade

    Insert the build tasks by running:

    puma$ cylc insert --no-check SUITE-ID fcm_make_um.CYCLE-POINT
    puma$ cylc insert --no-check SUITE-ID fcm_make2_um.CYCLE-POINT
    puma$ cylc insert --no-check SUITE-ID fcm_make_ocean.CYCLE-POINT
    puma$ cylc insert --no-check SUITE-ID fcm_make2_ocean.CYCLE-POINT

  • Right click on the fcm_make_* tasks and select “Release”

  • Wait for the code extraction tasks to finish, then right click on the fcm_make2_* tasks and select “Release” to rebuild the code.

  • Once the rebuild tasks have finished release the suite to continue running by selecting menu item “Control -> Release Suite (unpause)”