wiki:Archer2

Version 59 (modified by simon, 4 months ago) (diff)

ARCHER2

The ARCHER2 Service is a world class advanced computing resource for UK researchers. ARCHER2 is provided by UKRI, EPCC, Cray (an HPE company) and the University of Edinburgh.

ARCHER2 is due to commence operation in 2020, replacing the current service ARCHER. Please visit the ARCHER2 website.

Further information on moving to ARCHER2 will be made available here.

Pilot System

Prior to installation of the complete ARCHER2, we have access to a 4-cabinet pilot machine that will run in parallel with ARCHER. ARCHER users will find the new machine very familiar in many respects but with some important differences - see https://www.youtube.com/channel/UCZi-oBdxoDV5CPEQnhmrCAg/videos for a comprehensive array of presentations, in particular the one titled Differences between ARCHER and ARCHER2.

CMS has installed and undertaken limited testing of several versions of the Unified Model and its auxiliary software. The process is ongoing - we encourage users where possible to migrate their workflows to use the latest versions of the UM.

Limitations of the pilot system may result in some constraint on the nature of workflows that it can accommodate.

Scheduler

ARCHER2 uses SLURM (ARCHER used PBS), so all ARCHER batch scripts need to be rewritten for use on ARCHER2.

login nodes

The login nodes do support persistent ssh agents, so data transfer to JASMIN through Rose/Cylc workflows is possible.

compute nodes

Compute nodes can not see /home. Unlike ARCHER, batch scripts run on the compute nodes, so batch scripts must not have references to /home.

serial nodes

The pilot system does not have serial nodes. The full system will have serial nodes.

Access

Request access through the ARCHER2 SAFE.

File Systems

/home and /work file systems with identical structure to that on ARCHER. The pilot system will have only 325TB on /work and 1.7TB on /home; the full system will have substantially more.

Budgets

The ARCHER budget structure and membership will carry over to ARCHER2.

UM

Currently installed versions 7.3, 8.4, 11.[1-7]

Example jobs
UM version job/suite id config_root_path branches note
7.3 xoxtb CCMI
8.4 xoxta GLOMAP; + CLASSIC: RJ4.0 ARCHER GA4.0
11.1 u-be303 (deriv) fcm:um.x_br/dev/simonwilson/vn11.1_archer2_compile fcm:um.x_br/dev/jeffcole/vn11.1_archer2_fixes UKESM AMIP
11.1 u-ca103 fcm:um.x_br/dev/simonwilson/vn11.1_archer2_compile fcm:um.x_br/dev/jeffcole/vn11.1_archer2_fixes Nesting suite
11.2 u-be463 (deriv) fcm:um.x_br/dev/simonwilson/vn11.2_archer2_compile fcm:um.x_br/dev/jeffcole/vn11.2_archer2_fixes AMIP
11.2 u-bz764 fcm:um.x_br/dev/simonwilson/vn11.2_archer2_compile fcm:um.x_br/dev/jeffcole/vn11.2_archer2_fixes UKESM coupled
11.4 u-ca369 fcm:um.x_br/dev/simonwilson/vn11.4_archer2_compile GA7.0 N96 AMIP
11.5 u-br938? fcm:um.x_br/dev/simonwilson/vn11.5_archer2_compile GA7.1 N1280 UM11.5 AMIP
11.6 fcm:um.x_br/dev/simonwilson/vn11.6_archer2_compile
11.7

Table 1.

Initial setup on ARCHER2

If submitting from puma, add

. /work/y07/shared/umshared/bin/rose-um-env

to your ~/.bash_profile on ARCHER2.

For people submitting from pumatest add

. /work/y07/shared/umshared/bin/rose-um-env
export FCM_VERSION=pumatest
export CYLC_VERSION=pumatest
export ROSE_VERSION=pumatest

to your ~/.bash_profile on ARCHER2.

Quick Start

The following should be appropriate for most UM vn11.x versions. Unfortunately as UM suites can be configure multiple ways and there's no "standard" UM job configurations it cannot be guaranteed to work. If there are still issues, please reference the more detailed instructions following this section.

There are 4 stages in converting an ARCHER job into an ARCHER2 job.

  • Copy a standard archer2.rc site file.
  • Edit the suite.rc this use this file.
  • Edit the metadata file to allow access to the ARCHER2 configuration.
  • Update the GUI to use ARCHER2 and change the processor decomposition.

First, copy the working suite you wish to run on ARCHER2, and check it out and cd into its rose directory. Then, taking each of the above in turn:

archer2.rc

Atmosphere only jobs

Firstly look at your suite.rc file.

If it has lines of the form UM_ATM_NPROCX = {{MAIN_ATM_PROCX}} with a MAIN_ pre-pending the variable then copy /home/simon/archer2/archer2.rc_main to site/archer2.rc.

If it has lines of the form UM_ATM_NPROCX = {{ATM_PROCX}} then copy /home/simon/archer2/archer2.rc to site/archer2.rc.

If the UM_ATM_NPROCX = type lines use some other format, copy /home/simon/archer2/archer2.rc as a base and then edit it the change in each line in

{% set APPN = ATM_PPN if ATM_PPN is defined else PPN %}
{% set TASKS_RCF = RCF_PROCX * RCF_PROCY %}
{% set TASKS_ATM = ATM_PROCX * ATM_PROCY + IOS_NPROC %}
{% set NODE_RCF = node(TASKS_RCF, OMPTHR_RCF, HYPTHR_RCF, APPN) %}
{% set NODE_ATM = node(TASKS_ATM, OMPTHR_ATM, HYPTHR_ATM, APPN) %}
{% set TPNUMA_RCF = tpnuma(OMPTHR_RCF, HYPTHR_RCF, APPN, NUMA) %}
{% set TPNUMA_ATM = tpnuma(OMPTHR_ATM, HYPTHR_ATM, APPN, NUMA) %}

so that the variables match the equivalents in the suite.rc file.

Edit the line

 --chdir=/work/n02/n02/<username>

to use your ARCHER2 username in the copied archer2.rc.

Coupled and UKESM jobs

Copy /home/simon/archer2/archer2.rc_ukesm to site/archer2.rc.

Edit the line

 --chdir=/work/n02/n02/<username>

to use your archer username in the copied archer2.rc.

Note as the Coupled configuration is more complicated than the atmosphere only configuration, there's a greater possibility that the archer2.rc file will need further modifications. Please see the more detailed documentation below.

suite.rc

Search the suite.rc for any reference to archer. If there are none, go onto the next step. Otherwise, change each instance of a reference to archer with archer2.

For example change

% set KNOWN_SITE_CFGS = ['archer', 'meto_cray', 'monsoon', 'nci_raijin', 'niwa_cray'] %}

to

% set KNOWN_SITE_CFGS = ['archer2', 'meto_cray', 'monsoon', 'nci_raijin', 'niwa_cray'] %}

Note: If the line

%include site/archer-tests.rc

is present, do not change it.

metadata

In meta/rose-meta.conf, locate and change all instances of archer to archer2 and ARCHER to ARCHER2 etc.

For example:

[jinja2:suite.rc=ARCHER_GROUP]
compulsory=true
description=
help=Account code under which to run HPC tasks (e.g. n02-ncas)
ns=host
sort-key=archer_2
title=Account group for HPC tasks

should be updated to

[jinja2:suite.rc=ARCHER2_GROUP]
compulsory=true
description=
help=Account code under which to run HPC tasks (e.g. n02-ncas)
ns=host
sort-key=archer2_2
title=Account group for HPC tasks

Also, in [jinja2:suite.rc=MAIN_ATM_PPN] or [jinja2:suite.rc=ATM_PPN] change range=1:36 to range=1:128

GUI

Now start the GUI. There will be a warning triangle next to suite.conf. In suite conf→Host Machine select Archer2, then set the Queue standard queue and set the account group by clicking on the plus, then changing the values. Remember to put the account group in single quotes. In suite conf→Domain Decomposition→ Atmosphere set the Max number of processors\node to be 128. If not using IO servers, it is also a good idea to change the decomposition so that the NSxEWx(OpenMP threads) is some multiple of 128 for most efficient running. Keep the total number of cores roughly the same as for ARCHER.

In fcm_um_make→env→Configuration file set config_root_path to be

fcm:um.xm_br/dev/simonwilson/vn11.x_archer2_compile

where x is the UM version of the suite.

Remove the revision number from config_revision so that the feild is clear.

Save the new config.

You should now able to submit the job to ARCHER2.

End of quick start.

Rose/Cylc

The multiplicity and diversity of Rose/Cylc suites prevents us from providing a simple comprehensive guide to suite modifications necessary for running on ARCHER2. However, the suites referred to in Table 1 should give hints on to how to upgrade your suite. The suite changes required stem from the following differences between ARCHER and ARCHER2:

  • scheduler: ARCHER uses PBS, ARCHER2 uses SLURM
  • architecture: ARCHER has 24 cores per node, ARCHER2 has 128 cores per node

Changes to account for SLURM will typically be in the [[directives]] section of tasks in the suite.rc file or in an appropriate site/archer2.rc file (you my need to create one of these.) The example below serves to illustrate common SLURM features. Note: the SLURM directives --partition, --qos, and --reservation combine to provide a more flexible replacement for the PBS directive --queue. Additional partitions will become available with the full ARCHER2 system.

  [[HPC]]
        pre-script = """
             module restore /work/y07/shared/umshared/modulefiles/um/2020.12.14   <====== to load the environment
             module list 2>&1
             ulimit -s unlimited
                     """

        [[[directives]]]
            --export=none
            --chdir=/work/n02/n02/<your ARCHER2 user name>   <===== you must set this 
            --partition=standard
            --qos={{ARCHER2_QUEUE}}
{% if ARCHER2_QUEUE == 'short' %}
            --reservation=shortqos
{% endif %}
            --account={{ARCHER2_GROUP}}
        [[[environment]]]
            PLATFORM = cce
            UMDIR = /work/y07/shared/umshared
        [[[job]]]
            batch system = slurm                             <===== specify use of SLURM
        [[[remote]]]
            host = login.archer2.ac.uk                       <====== use ARCHER2
{% if HPC_USER is defined %}
            owner = {{HPC_USER}}
{% endif %}

    [[HPC_SERIAL]]
        inherit = HPC
        [[[directives]]]
            --nodes=1
            --tasks-per-node=128
            --cpus-per-task=1
        [[[environment]]]
            ROSE_TASK_N_JOBS = 32

    [[UMBUILD]]
        [[[environment]]]
            CONFIG = ncas-ex-cce                             <====== note name of config for ARCHER2
           

Setting SLURM options that specify the number of processors requires assigning values to --nodes, --ntasks, --tasks-per-node, and --cpus-per-task. These should be familiar from ARCHER modulo the precise names for the attributes. Your suite may use different names for the various parameters, such as TASKS_RCF, for example, but there should be a simple correspondence.

   [[RCF_RESOURCE]]
        inherit = UM_PARALLEL
        [[[directives]]]
            --nodes={{NODE_RCF}}
            --ntasks= {{TASKS_RCF}}
            --tasks-per-node={{8*(TPNUMA_RCF|int)}}
            --cpus-per-task={{MAIN_OMPTHR_RCF}}
        [[[environment]]]
            OMP_NUM_THREADS={{MAIN_OMPTHR_RCF}}
            ROSE_LAUNCHER_PREOPTS = {{RCF_SLURM_FLAGS}}
        [[[job]]]
            execution time limit = PT20M

    [[ATMOS_RESOURCE]]
        inherit = UM_PARALLEL, SUBMIT_RETRIES
        [[[directives]]]
            --nodes={{NODE_ATM}}
            --ntasks= {{TASKS_ATM}}
            --tasks-per-node={{8*(TPNUMA_ATM|int)}}
            --cpus-per-task={{MAIN_OMPTHR_ATM}}
        [[[environment]]]
            OMP_NUM_THREADS={{MAIN_OMPTHR_ATM}}
            ROSE_LAUNCHER_PREOPTS = {{ATM_SLURM_FLAGS}}
        [[[job]]]
            execution time limit = {{MAIN_CLOCK}}

Your suite should include a section to specify the flags that will be passed the command to launch the job (for ARCHER that command is aprun, for ARCHER2 it is srun.) The flags are different for jobs running with or without OpenMP. Most suites will need some jinja like this:

{# set up slurm flags for OpenMP/non-OpenMP #}
{% if MAIN_OMPTHR_RCF > 1 %}
 {% set RCF_SLURM_FLAGS= "--hint=nomultithread --distribution=block:block" %}
{% else %}
 {% set RCF_SLURM_FLAGS = "--cpu-bind=cores" %}
{% endif %}
{% if MAIN_OMPTHR_ATM > 1 %}
 {% set ATM_SLURM_FLAGS= "--hint=nomultithread --distribution=block:block" %}
{% else %}
 {% set ATM_SLURM_FLAGS = "--cpu-bind=cores" %}
{% endif %}

Suites frequently contain macros to calculate the number of nodes and cores required - the only change needed is to set to 8 the number of NUMA regions per node.

Coupled suites

We have adopted the SLURM heterogeneous jobs method of handling coupled suites where the atmosphere, NEMO, and XIOS are separate executables running under a common communicator. The basic SLURM ideas above carry over to heterogeneous jobs but rather than making an overarching job resource request (as is the case for PBS), each component of the coupled job specifies its own requirements.

For the coupled task (or in its inherited resources)

        [[[directives]]]
            hetjob_0_--nodes={{ATMOS_NODES}}
            hetjob_0_--ntasks={{ATMOS_TASKS}}
            hetjob_0_--tasks-per-node={{ATMOS_PPNU*NUMA}}
            hetjob_0_--cpus-per-task={{OMPTHR_ATM}}
            hetjob_1_--partition=standard
            hetjob_1_--nodes={{OCEAN_NODES}}
            hetjob_1_--ntasks= {{OCEAN_TASKS}}
            hetjob_1_--tasks-per-node={{OCEAN_PPNU*NUMA}}
            hetjob_1_--cpus-per-task={{OMPTHR_OCN}}
            hetjob_2_--partition=standard
            hetjob_2_--nodes={{XIOS_NODES}}
            hetjob_2_--ntasks= {{XIOS_TASKS}}
            hetjob_2_--tasks-per-node={{XIOS_PPNU*NUMA}}
            hetjob_2_--cpus-per-task=1

where hetjob_0_ is associated with the atmosphere, hetjob_1_ with the ocean, and hetjob_2_ with the (XIOS)io-servers.

The variables ROSE_LAUNCHER_PREOPTS_UM, ROSE_LAUNCHER_PREOPTS_NEMO, and ROSE_LAUNCHER_PREOPTS_XIOS also need modification to link the resource request to the job launcher command, for example:

            {% if OMPTHR_ATM > 1 %}
              ROSE_LAUNCHER_PREOPTS_UM  = --het-group=0 --hint=nomultithread --distribution=block:block --export=all,OMP_NUM_THREADS={{OMPTHR_ATM}},HYPERTHREADS={{HYPERTHREADS}},OMP_PLACES=cores
            {% else %}
              ROSE_LAUNCHER_PREOPTS_UM  = --het-group=0 --cpu-bind=cores --export=all,OMP_NUM_THREADS={{OMPTHR_ATM}},HYPERTHREADS={{HYPERTHREADS}}
            {% endif %}

where the flag --het-group=0 makes the connection to hetjob_0_.

Post Processing

Please see wiki:Archer/Transition2020/PPTransfer

UMUI

UM 8.4

Very few changes are required in order to run these jobs:

  • in Model Selection → Input/Output Control and Resources → Time Convention and …
    • set DATADIR (this must be on /work)
  • in Model Selection → User Information and Submit Method → Job submission method
    • select SLURM (Cray EX)
    • Set the number of processors to be a multiple of 128
    • click the SLURM button to specify the Job time limit

Post Processing has not been tested.

UM 7.3

umshared

All n02 users will be granted read access to the umshared package account (as was the case on ARCHER.) UM data and software is installed at /work/y07/shared/umshared. You may set UMDIR in your .bash_profile, but note, batch jobs can not see /home and will not source scripts that reside there.


gotchas

Performance

Suite id Description UM Version Date Run Platform Decomposition* OpenMP Threads Length of run (days) Dump Frequency (days) Wallclock (h:m) Data output vol (GB) Comment Cost/model-yr
u-bz764 UKESM 11.2 11.12.20 ARCHER2 32x18x2:12x9:8 (9:1:1) 2(atm) 90 90 1:41 103 73.3 CU (135kAU)
u-be303-archer2 UKESM AMIP 11.1 14.12.20 ARCHER2 16x16x2 (4) 2 90 90 2:42 80 OOMs on 1 thread 43.2 CU (82.9kAU)
u-be303-archer2 UKESM AMIP 11.1 15.12.20 ARCHER2 32x18x2 (9) 2 90 90 1:32 80 54 CU (103kAU)

*(atm:ocean:xios (nodes))