wiki:Archer2

ARCHER2

The ARCHER2 Service is a world class advanced computing resource for UK researchers. ARCHER2 is provided by UKRI, EPCC, Cray (an HPE company) and the University of Edinburgh.

ARCHER2 is due to commence operation in 2020, replacing the current service ARCHER. Please visit the ARCHER2 website.

Further information on moving to ARCHER2 will be made available here.

Pilot System

Prior to installation of the complete ARCHER2, we have access to a 4-cabinet pilot machine that will run in parallel with ARCHER. ARCHER users will find the new machine very familiar in many respects but with some important differences - see https://www.youtube.com/channel/UCZi-oBdxoDV5CPEQnhmrCAg/videos for a comprehensive array of presentations, in particular the one titled Differences between ARCHER and ARCHER2.

CMS has installed and undertaken limited testing of several versions of the Unified Model and its auxiliary software. The process is ongoing - we encourage users where possible to migrate their workflows to use the latest versions of the UM.

Limitations of the pilot system may result in some constraint on the nature of workflows that it can accommodate.

Scheduler

ARCHER2 uses SLURM (ARCHER used PBS), so all ARCHER batch scripts need to be rewritten for use on ARCHER2.

login nodes

The login nodes do support persistent ssh agents, so data transfer to JASMIN through Rose/Cylc workflows is possible.

compute nodes

Compute nodes can not see /home. Unlike ARCHER, batch scripts run on the compute nodes, so batch scripts must not have references to /home.

serial nodes

The pilot system does not have serial nodes. The full system will have serial nodes.

Access

Request access through the ARCHER2 SAFE.

File Systems

/home and /work file systems with identical structure to that on ARCHER. The pilot system will have only 325TB on /work and 1.7TB on /home; the full system will have substantially more.

Budgets

The ARCHER budget structure and membership will carry over to ARCHER2.

UM

Currently installed versions 7.3, 8.4, 11.1, 11.2, 11.5, 11.6

Example jobs
UM version job/suite id config_root_path branches note
7.3 abxcd CCMI
8.4 xoxta GLOMAP; + CLASSIC: RJ4.0 ARCHER GA4.0
11.1 u-be303 (deriv) fcm:um.x_br/dev/simonwilson/vn11.1_archer2_compile fcm:um.x_br/dev/jeffcole/vn11.1_archer2_fixes UKESM AMIP
11.2 u-be463 (deriv) fcm:um.x_br/dev/simonwilson/vn11.2_archer2_compile fcm:um.x_br/dev/jeffcole/vn11.2_archer2_fixes AMIP
11.2 u-bz746 (deriv) fcm:um.x_br/dev/simonwilson/vn11.2_archer2_compile fcm:um.x_br/dev/jeffcole/vn11.2_archer2_fixes UKESM coupled
11.5 u-br938? fcm:um.x_br/dev/simonwilson/vn11.5_archer2_compile GA7.1 N1280 UM11.5 AMIP
11.6 fcm:um.x_br/dev/simonwilson/vn11.6_archer2_compile

Table 1.

Rose/Cylc

The multiplicity and diversity of Rose/Cylc suites prevents us from providing a simple comprehensive guide to suite modifications necessary for running on ARCHER2. However, the suites referred to in Table 1 should give hints on to how to upgrade your suite. The suite changes required stem from the following differences between ARCHER and ARCHER2:

  • scheduler: ARCHER uses PBS, ARCHER2 uses SLURM
  • architecture: ARCHER has 24 cores per node, ARCHER2 has 128 cores per node

Changes to account for SLURM will typically be in the [[directives]] section of tasks in the suite.rc file or in an appropriate site/archer2.rc file (you my need to create one of these.) The example below serves to illustrate common SLURM features. Note: the SLURM directives --partition, --qos, and --reservation combine to provide a more flexible replacement for the PBS directive --queue. Additional partitions will become available with the full ARCHER2 system.

  [[HPC]]
        pre-script = """
             module restore /work/n02/n02/simon/um_modules   <====== to load the environment
             module list 2>&1
             ulimit -s unlimited
                     """

        [[[directives]]]
            --export=none
            --chdir=/work/n02/n02/<your ARCHER2 user name>   <===== you must set this 
            --partition=standard
            --qos={{ARCHER2_QUEUE}}
{% if ARCHER2_QUEUE == 'short' %}
            --reservation=shortqos
{% endif %}
            --account={{ARCHER2_GROUP}}
        [[[environment]]]
            PLATFORM = cce
            UMDIR = /work/y07/shared/umshared
        [[[job]]]
            batch system = slurm                             <===== specify use of SLURM
        [[[remote]]]
            host = login.archer2.ac.uk                       <====== use ARCHER2
{% if HPC_USER is defined %}
            owner = {{HPC_USER}}
{% endif %}

    [[HPC_SERIAL]]
        inherit = HPC
        [[[directives]]]
            --nodes=1
            --tasks-per-node=128
            --cpus-per-task=1
        [[[environment]]]
            ROSE_TASK_N_JOBS = 32

    [[UMBUILD]]
        [[[environment]]]
            CONFIG = ncas-ex-cce                             <====== note name of config for ARCHER2
           

Setting SLURM options that specify the number of processors requires assigning values to --nodes, --ntasks, --tasks-per-node, and --cpus-per-task. These should be familiar from ARCHER modulo the precise names for the attributes. Your suite may use different names for the various parameters, such as TASKS_RCF, for example, but there should be a simple correspondence.

   [[RCF_RESOURCE]]
        inherit = UM_PARALLEL
        [[[directives]]]
            --nodes={{NODE_RCF}}
            --ntasks= {{TASKS_RCF}}
            --tasks-per-node={{8*(TPNUMA_RCF|int)}}
            --cpus-per-task={{MAIN_OMPTHR_RCF}}
        [[[environment]]]
            OMP_NUM_THREADS={{MAIN_OMPTHR_RCF}}
            ROSE_LAUNCHER_PREOPTS = {{RCF_SLURM_FLAGS}}
        [[[job]]]
            execution time limit = PT20M

    [[ATMOS_RESOURCE]]
        inherit = UM_PARALLEL, SUBMIT_RETRIES
        [[[directives]]]
            --nodes={{NODE_ATM}}
            --ntasks= {{TASKS_ATM}}
            --tasks-per-node={{8*(TPNUMA_ATM|int)}}
            --cpus-per-task={{MAIN_OMPTHR_ATM}}
        [[[environment]]]
            OMP_NUM_THREADS={{MAIN_OMPTHR_ATM}}
            ROSE_LAUNCHER_PREOPTS = {{ATM_SLURM_FLAGS}}
        [[[job]]]
            execution time limit = {{MAIN_CLOCK}}

Your suite should include a section to specify the flags that will be passed the command to launch the job (for ARCHER that command is aprun, for ARCHER2 it is srun.) The flags are different for jobs running with or without OpenMP. Most suites will need some jinja like this:

{# set up slurm flags for OpenMP/non-OpenMP #}
{% if MAIN_OMPTHR_RCF > 1 %}
 {% set RCF_SLURM_FLAGS= "--hint=nomultithread --distribution=block:block" %}
{% else %}
 {% set RCF_SLURM_FLAGS = "--cpu-bind=cores" %}
{% endif %}
{% if MAIN_OMPTHR_ATM > 1 %}
 {% set ATM_SLURM_FLAGS= "--hint=nomultithread --distribution=block:block" %}
{% else %}
 {% set ATM_SLURM_FLAGS = "--cpu-bind=cores" %}
{% endif %}

Suites frequently contain macros to calculate the number of nodes and cores required - the only change needed is to set to 8 the number of NUMA regions per node.

Coupled suites

We have adopted the SLURM heterogeneous jobs method of handling coupled suites where the atmosphere, NEMO, and XIOS are separate executables running under a common communicator. The basic SLURM ideas above carry over to heterogeneous jobs but rather than making an overarching job resource request (as is the case for PBS), each component of the coupled job specifies its own requirements.

To inform Cylc that it needs to know about het. jobs, set

        [[[job]]]
            batch system = slurm_hetero

For the coupled task (or in its inherited resources)

        [[[directives]]]
            hetjob_0_--nodes={{ATMOS_NODES}}
            hetjob_0_--ntasks={{ATMOS_TASKS}}
            hetjob_0_--tasks-per-node={{ATMOS_PPNU*NUMA}}
            hetjob_0_--cpus-per-task={{OMPTHR_ATM}}
            hetjob_1_--partition=standard
            hetjob_1_--nodes={{OCEAN_NODES}}
            hetjob_1_--ntasks= {{OCEAN_TASKS}}
            hetjob_1_--tasks-per-node={{OCEAN_PPNU*NUMA}}
            hetjob_1_--cpus-per-task={{OMPTHR_OCN}}
            hetjob_2_--partition=standard
            hetjob_2_--nodes={{XIOS_NODES}}
            hetjob_2_--ntasks= {{XIOS_TASKS}}
            hetjob_2_--tasks-per-node={{XIOS_PPNU*NUMA}}
            hetjob_2_--cpus-per-task=1

where hetjob_0_ is associated with the atmosphere, hetjob_1_ with the ocean, and hetjob_2_ with the (XIOS)io-servers.

The variables ROSE_LAUNCHER_PREOPTS_UM, ROSE_LAUNCHER_PREOPTS_NEMO, and ROSE_LAUNCHER_PREOPTS_XIOS also need modification to link the resource request to the job launcher command, for example:

            {% if OMPTHR_ATM > 1 %}
              ROSE_LAUNCHER_PREOPTS_UM  = --het-group=0 --hint=nomultithread --distribution=block:block --export=all,OMP_NUM_THREADS={{OMPTHR_ATM}},HYPERTHREADS={{HYPERTHREADS}},OMP_PLACES=cores
            {% else %}
              ROSE_LAUNCHER_PREOPTS_UM  = --het-group=0 --cpu-bind=cores --export=all,OMP_NUM_THREADS={{OMPTHR_ATM}},HYPERTHREADS={{HYPERTHREADS}}
            {% endif %}

where the flag --het-group=0 makes the connection to hetjob_0_.

Post Processing

UMUI

UM 8.4

Very few changes are required in order to run these jobs:

  • in Model Selection → Input/Output Control and Resources → Time Convention and …
    • set DATADIR (this must be on /work)
  • in Model Selection → User Information and Submit Method → Job submission method
    • select SLURM (Cray EX)
    • Set the number of processors to be a multiple of 128
    • click the SLURM button to specify the Job time limit

Post Processing has not been tested.

UM 7.3

umshared

All n02 users will be granted read access to the umshared package account (as was the case on ARCHER.) UM data and software is installed at /work/y07/shared/umshared. You may set UMDIR in your .bash_profile, but note, batch jobs can not see /home and will not source scripts that reside there.


gotchas

Last modified 88 minutes ago Last modified on 02/12/20 14:44:40