wiki:Archer2

Version 82 (modified by grenville, 3 months ago) (diff)

ARCHER2

The ARCHER2 Service is a world class advanced computing resource for UK researchers. ARCHER2 is provided by UKRI, EPCC, Cray (an HPE company) and the University of Edinburgh.

ARCHER2 is due to commence operation in 2020, replacing the current service ARCHER. Please visit the ARCHER2 website.

Further information on moving to ARCHER2 will be made available here.

Pilot System

Prior to installation of the complete ARCHER2, we have access to a 4-cabinet pilot machine that will run in parallel with ARCHER. ARCHER users will find the new machine very familiar in many respects but with some important differences - see https://www.youtube.com/channel/UCZi-oBdxoDV5CPEQnhmrCAg/videos for a comprehensive array of presentations, in particular the one titled Differences between ARCHER and ARCHER2.

CMS has installed and undertaken limited testing of several versions of the Unified Model and its auxiliary software. The process is ongoing - we encourage users where possible to migrate their workflows to use the latest versions of the UM.

Limitations of the pilot system may result in some constraint on the nature of workflows that it can accommodate.

Scheduler

ARCHER2 uses SLURM (ARCHER used PBS), so all ARCHER batch scripts need to be rewritten for use on ARCHER2.

login nodes

The login nodes do support persistent ssh agents, so data transfer to JASMIN through Rose/Cylc workflows is possible.

compute nodes

Compute nodes can not see /home. Unlike ARCHER, batch scripts run on the compute nodes, so batch scripts must not have references to /home.

serial nodes

The pilot system does not have serial nodes. The full system will have serial nodes.

Access

Request access through the ARCHER2 SAFE.

File Systems

/home and /work file systems with identical structure to that on ARCHER. The pilot system will have only 325TB on /work and 1.7TB on /home; the full system will have substantially more.

Budgets

The ARCHER budget structure and membership will carry over to ARCHER2.

UM

Currently installed versions 7.3, 8.4, 11.1 up to 11.7

Example jobs
UM version job/suite id config_root_path branches note
7.3 xoxtb CCMI
8.4 xoxta GLOMAP; + CLASSIC: RJ4.0 ARCHER GA4.0
11.1 u-be303/archer2 fcm:um.x_br/dev/simonwilson/vn11.1_archer2_compile fcm:um.x_br/dev/jeffcole/vn11.1_archer2_fixes UKESM AMIP
11.1 u-ca103 fcm:um.x_br/dev/simonwilson/vn11.1_archer2_compile fcm:um.x_br/dev/jeffcole/vn11.1_archer2_fixes Nesting suite
11.2 u-be463 (deriv) fcm:um.x_br/dev/simonwilson/vn11.2_archer2_compile fcm:um.x_br/dev/jeffcole/vn11.2_archer2_fixes AMIP
11.2 u-bc964/archer2 fcm:um.x_br/dev/simonwilson/vn11.2_archer2_compile fcm:um.x_br/dev/jeffcole/vn11.2_archer2_fixes UKESM coupled - cpmip analysis not currently working
11.4 u-ca369 fcm:um.x_br/dev/simonwilson/vn11.4_archer2_compile GA7.0 N96 AMIP
11.5 u-ca370 fcm:um.x_br/dev/simonwilson/vn11.5_archer2_compile GA7.1 N1280 UM11.5 AMIP
11.6 u-bs251/archer2 fcm:um.x_br/dev/simonwilson/vn11.6_archer2_compile
11.7

Table 1.

Initial setup for running UM Rose/Cylc Suites on ARCHER2

If submitting from puma, add

. /work/y07/shared/umshared/bin/rose-um-env

to your ~/.bash_profile on ARCHER2.

For people submitting from pumatest add

. /work/y07/shared/umshared/bin/rose-um-env
export FCM_VERSION=pumatest
export CYLC_VERSION=pumatest
export ROSE_VERSION=pumatest

to your ~/.bash_profile on ARCHER2.

Quick Start

The following should be appropriate for most UM vn11.x versions. Unfortunately as UM suites can be configured in multiple ways and there's no "standard" UM job configurations it cannot be guaranteed to work. If there are still issues, please reference the more detailed instructions following this section.

There are 4 stages in converting an ARCHER job into an ARCHER2 job.

  • Copy a standard archer2.rc site file.
  • Edit the suite.rc to use this file.
  • Edit the metadata file to allow access to the ARCHER2 configuration.
  • Update the GUI to use ARCHER2 and change the processor decomposition.

First, copy the working suite you wish to run on ARCHER2, and check it out and cd into its rose directory. Then, taking each of the above in turn:

archer2.rc

Atmosphere only jobs

Firstly look at your suite.rc file.

If it has lines of the form UM_ATM_NPROCX = {{MAIN_ATM_PROCX}} with a MAIN_ pre-pending the variable then copy /home/simon/archer2/archer2.rc_main to site/archer2.rc.

If it has lines of the form UM_ATM_NPROCX = {{ATM_PROCX}} then copy /home/simon/archer2/archer2.rc to site/archer2.rc.

If the UM_ATM_NPROCX = type lines use some other format, copy /home/simon/archer2/archer2.rc as a base and then edit it to change each line in

{% set APPN = ATM_PPN if ATM_PPN is defined else PPN %}
{% set TASKS_RCF = RCF_PROCX * RCF_PROCY %}
{% set TASKS_ATM = ATM_PROCX * ATM_PROCY + IOS_NPROC %}
{% set NODE_RCF = node(TASKS_RCF, OMPTHR_RCF, HYPTHR_RCF, APPN) %}
{% set NODE_ATM = node(TASKS_ATM, OMPTHR_ATM, HYPTHR_ATM, APPN) %}
{% set TPNUMA_RCF = tpnuma(OMPTHR_RCF, HYPTHR_RCF, APPN, NUMA) %}
{% set TPNUMA_ATM = tpnuma(OMPTHR_ATM, HYPTHR_ATM, APPN, NUMA) %}

so that the variables match the equivalents in the suite.rc file.

Edit the line

 --chdir=/work/n02/n02/<username>

to use your ARCHER2 username in the copied archer2.rc.

Users of vn11.7 may have to change EXPT_HORIZ to EXPT_HORIZ_ATM in archer2.rc

Coupled and UKESM jobs

Copy /home/simon/archer2/archer2.rc_ukesm to site/archer2.rc.

Edit the line

 --chdir=/work/n02/n02/<username>

to use your archer username in the copied archer2.rc.

Note as the Coupled configuration is more complicated than the atmosphere only configuration, there's a greater possibility that the archer2.rc file will need further modifications. Please see the more detailed documentation below.

suite.rc

Search the suite.rc for any reference to archer. If there are none, go onto the next step. Otherwise, change each instance of a reference to archer with archer2.

For example change

% set KNOWN_SITE_CFGS = ['archer', 'meto_cray', 'monsoon', 'nci_raijin', 'niwa_cray'] %}

to

% set KNOWN_SITE_CFGS = ['archer2', 'meto_cray', 'monsoon', 'nci_raijin', 'niwa_cray'] %}

Note: If the line

%include site/archer-tests.rc

is present, do not change it.

metadata

In meta/rose-meta.conf, locate and change all instances of archer to archer2, ARCHER to ARCHER2, Archer to Archer2, etc.

For example:

[jinja2:suite.rc=ARCHER_GROUP]
compulsory=true
description=
help=Account code under which to run HPC tasks (e.g. n02-ncas)
ns=host
sort-key=archer_2
title=Account group for HPC tasks

should be updated to

[jinja2:suite.rc=ARCHER2_GROUP]
compulsory=true
description=
help=Account code under which to run HPC tasks (e.g. n02-ncas)
ns=host
sort-key=archer2_2
title=Account group for HPC tasks

Also, in [jinja2:suite.rc=MAIN_ATM_PPN] or [jinja2:suite.rc=ATM_PPN] change range=1:36 to range=1:128

GUI

Now start the GUI with rose edit. There will be a warning triangle next to the suite conf section. In suite conf→Host Machine select Archer2, then set the Queue standard queue and the account group by clicking on the plus to add then to the configuration, and then changing the values. This may be in a subsection, selected by clicking the arrow. Remember to put the account group in single quotes. The aim is to remove all of the warning triangles.

In suite conf→Domain Decomposition set the Max number of processors/node to be 128. If you plan to depopulate the node, set this to some multiple of 8. If not using IO servers, it is also a good idea to change the Atmosphere decomposition so that the NSxEWx(OpenMP threads) is some multiple of 128 for most efficient running. Keep the total number of cores roughly the same as for ARCHER.

In fcm_um_make→env→Configuration file set config_root_path to be

fcm:um.xm_br/dev/simonwilson/vn11.x_archer2_compile

where x is the UM version of the suite. Remove the revision number from config_revision so that the field is clear.

For vn11.1 and vn11.2 jobs, in fcm_um_make→env→Sources add

fcm:um.x_br/dev/jeffcole/vn11.x_archer2_fixes

where x is the UM version of the suite.

Ensure that Run Development Tests under suite conf→Tasks is set to false as these don't currently work on ARCHER2.

Save the new config.

Note: If you have any bespoke ARCHER app conf files (possibly set in Model Configuration), these may have to be renamed and updated for ARCHER2

You should now able to submit the job to ARCHER2.

End of quick start.

Rose/Cylc

The multiplicity and diversity of Rose/Cylc suites prevents us from providing a simple comprehensive guide to suite modifications necessary for running on ARCHER2. However, the suites referred to in Table 1 should give hints on to how to upgrade your suite. The suite changes required stem from the following differences between ARCHER and ARCHER2:

  • scheduler: ARCHER uses PBS, ARCHER2 uses SLURM
  • architecture: ARCHER has 24 cores per node, ARCHER2 has 128 cores per node

Changes to account for SLURM will typically be in the [[directives]] section of tasks in the suite.rc file or in an appropriate site/archer2.rc file (you my need to create one of these.) The example below serves to illustrate common SLURM features. Note: the SLURM directives --partition, --qos, and --reservation combine to provide a more flexible replacement for the PBS directive --queue. Additional partitions will become available with the full ARCHER2 system.

  [[HPC]]
        pre-script = """
             module restore /work/y07/shared/umshared/modulefiles/um/2020.12.14   <====== to load the environment
             module list 2>&1
             ulimit -s unlimited
                     """

        [[[directives]]]
            --export=none
            --chdir=/work/n02/n02/<your ARCHER2 user name>   <===== you must set this 
            --partition=standard
            --qos={{ARCHER2_QUEUE}}
{% if ARCHER2_QUEUE == 'short' %}
            --reservation=shortqos
{% endif %}
            --account={{ARCHER2_GROUP}}
        [[[environment]]]
            PLATFORM = cce
            UMDIR = /work/y07/shared/umshared
        [[[job]]]
            batch system = slurm                             <===== specify use of SLURM
        [[[remote]]]
            host = login.archer2.ac.uk                       <====== use ARCHER2
{% if HPC_USER is defined %}
            owner = {{HPC_USER}}
{% endif %}

    [[HPC_SERIAL]]
        inherit = HPC
        [[[directives]]]
            --nodes=1
            --tasks-per-node=128
            --cpus-per-task=1
        [[[environment]]]
            ROSE_TASK_N_JOBS = 32

    [[UMBUILD]]
        [[[environment]]]
            CONFIG = ncas-ex-cce                             <====== note name of config for ARCHER2
           

Setting SLURM options that specify the number of processors requires assigning values to --nodes, --ntasks, --tasks-per-node, and --cpus-per-task. These should be familiar from ARCHER modulo the precise names for the attributes. Your suite may use different names for the various parameters, such as TASKS_RCF, for example, but there should be a simple correspondence.

   [[RCF_RESOURCE]]
        inherit = UM_PARALLEL
        [[[directives]]]
            --nodes={{NODE_RCF}}
            --ntasks= {{TASKS_RCF}}
            --tasks-per-node={{8*(TPNUMA_RCF|int)}}
            --cpus-per-task={{MAIN_OMPTHR_RCF}}
        [[[environment]]]
            OMP_NUM_THREADS={{MAIN_OMPTHR_RCF}}
            ROSE_LAUNCHER_PREOPTS = {{RCF_SLURM_FLAGS}}
        [[[job]]]
            execution time limit = PT20M

    [[ATMOS_RESOURCE]]
        inherit = UM_PARALLEL, SUBMIT_RETRIES
        [[[directives]]]
            --nodes={{NODE_ATM}}
            --ntasks= {{TASKS_ATM}}
            --tasks-per-node={{8*(TPNUMA_ATM|int)}}
            --cpus-per-task={{MAIN_OMPTHR_ATM}}
        [[[environment]]]
            OMP_NUM_THREADS={{MAIN_OMPTHR_ATM}}
            ROSE_LAUNCHER_PREOPTS = {{ATM_SLURM_FLAGS}}
        [[[job]]]
            execution time limit = {{MAIN_CLOCK}}

Your suite should include a section to specify the flags that will be passed the command to launch the job (for ARCHER that command is aprun, for ARCHER2 it is srun.) The flags are different for jobs running with or without OpenMP. Most suites will need some jinja like this:

{# set up slurm flags for OpenMP/non-OpenMP #}
{% if MAIN_OMPTHR_RCF > 1 %}
 {% set RCF_SLURM_FLAGS= "--hint=nomultithread --distribution=block:block" %}
{% else %}
 {% set RCF_SLURM_FLAGS = "--cpu-bind=cores" %}
{% endif %}
{% if MAIN_OMPTHR_ATM > 1 %}
 {% set ATM_SLURM_FLAGS= "--hint=nomultithread --distribution=block:block" %}
{% else %}
 {% set ATM_SLURM_FLAGS = "--cpu-bind=cores" %}
{% endif %}

Suites frequently contain macros to calculate the number of nodes and cores required - the only change needed is to set to 8 the number of NUMA regions per node.

Coupled suites

We have adopted the SLURM heterogeneous jobs method of handling coupled suites where the atmosphere, NEMO, and XIOS are separate executables running under a common communicator. The basic SLURM ideas above carry over to heterogeneous jobs but rather than making an overarching job resource request (as is the case for PBS), each component of the coupled job specifies its own requirements.

For the coupled task (or in its inherited resources)

        [[[directives]]]
            hetjob_0_--nodes={{ATMOS_NODES}}
            hetjob_0_--ntasks={{ATMOS_TASKS}}
            hetjob_0_--tasks-per-node={{ATMOS_PPNU*NUMA}}
            hetjob_0_--cpus-per-task={{OMPTHR_ATM}}
            hetjob_1_--partition=standard
            hetjob_1_--nodes={{OCEAN_NODES}}
            hetjob_1_--ntasks= {{OCEAN_TASKS}}
            hetjob_1_--tasks-per-node={{OCEAN_PPNU*NUMA}}
            hetjob_1_--cpus-per-task={{OMPTHR_OCN}}
            hetjob_2_--partition=standard
            hetjob_2_--nodes={{XIOS_NODES}}
            hetjob_2_--ntasks= {{XIOS_TASKS}}
            hetjob_2_--tasks-per-node={{XIOS_PPNU*NUMA}}
            hetjob_2_--cpus-per-task=1

where hetjob_0_ is associated with the atmosphere, hetjob_1_ with the ocean, and hetjob_2_ with the (XIOS)io-servers.

The variables ROSE_LAUNCHER_PREOPTS_UM, ROSE_LAUNCHER_PREOPTS_NEMO, and ROSE_LAUNCHER_PREOPTS_XIOS also need modification to link the resource request to the job launcher command, for example:

            {% if OMPTHR_ATM > 1 %}
              ROSE_LAUNCHER_PREOPTS_UM  = --het-group=0 --hint=nomultithread --distribution=block:block --export=all,OMP_NUM_THREADS={{OMPTHR_ATM}},HYPERTHREADS={{HYPERTHREADS}},OMP_PLACES=cores
            {% else %}
              ROSE_LAUNCHER_PREOPTS_UM  = --het-group=0 --cpu-bind=cores --export=all,OMP_NUM_THREADS={{OMPTHR_ATM}},HYPERTHREADS={{HYPERTHREADS}}
            {% endif %}

where the flag --het-group=0 makes the connection to hetjob_0_.

Post Processing and Data Transfer to JASMIN

In ~/roses/<SUITEID>/site/archer2.rc ensure that [[POSTPROC_RESOURCE]] loads the correct module and sets the stack limit, thus:

    [[POSTPROC_RESOURCE]]
        inherit = HPC_SERIAL
        pre-script = """module restore /work/y07/shared/umshared/modulefiles/postproc/2020.12.11
                        module list 2>&1
                        ulimit -s unlimited
                     """

For guidance on configuring the data transfer app. see wiki:Archer2/PPTransfer

UMUI

Important:
UMUI jobs will not be available to run on ARCHER2 until ARCHER is decommisioned

NCAS CMS will support only UM versions 7.3 and 8.4 on ARCHER2. For each version, currently only cumf and pumf have been built to run on ARCHER2.

Initial setup for running UMUI jobs on ARCHER2

1) Add the following snippet to your ARCHER2 ~/.bash_profile:

# Setup UM Variables
VN=7.3 ## or 8.4 as appropriate
if test -f $HOME/.umsetvars_$VN; then
  . $HOME/.umsetvars_$VN
else
  . /work/y07/shared/umshared/vn$VN/cce/scripts/.umsetvars_$VN
fi

2) Setup umui_runs directory:

archer2$ mkdir /work/n02/n02/<archer2_username>/umui_runs
archer2$ ln -s /work/n02/n02/<archer2_username>/umui_runs ~/umui_runs

Very few changes are required in order to run these jobs:

UM 8.4

  • in Model Selection → User Information and Submit method → Job submission method
    • Select submission method: SLURM Cray EX (ARCHER2)
    • Set Host name to login.archer2.ac.uk
    • Set the number of processors to be a multiple of 128
    • click the Slurm button to specify the Job time limit
  • In Model Selection → FCM Configuration → FCM Extract directories and Output levels
    • Set Target machine root directory (UM_ROUTDIR) to a location on /work (e.g. /work/n02/n02/$USERID/um)
  • in Model Selection → Input/Output Control and Resources → Time Convention and SCRIPT Environment Variables
    • Set DATADIR in the Defined Environment Variables table. This must be on /work(e.g. /work/n02/n02/<username>)
    • Ensure DATAM and DATAW are set to a location on /work. E.g $DATADIR/um/$RUNID

UM 7.3

  • In Model Selection → User Information and Target Machine → Target Machine
    • Set Machine name to login.archer2.ac.uk
    • Set the number of processors to be a multiple of 128
  • In Model Selection → Input/Output Control and Resources → Time Convention and SCRIPT Environment Variables
    • Set DATADIR in the Defined Environment Variables table. This must be on /work (e.g. /work/n02/n02/<username>)
    • Ensure DATAM and DATAW are set to a location on /work. E.g $DATADIR/um/$RUNID
  • In Model Selection → Input/Output Control and Resources → Job submission, resources and re-submission pattern
    • Select submission method: SLURM Cray EX (ARCHER2)
    • Note - the model will not recognize a change to the default number of cores/node
  • In Model Selection → FCM Extract and Build directories and Output levels
    • Set Target machine root directory (UM_ROUTDIR) to a location on /work (e.g. /work/n02/n02/ros/um)
  • In Model Selection → Compilation and Modifications → UM User Override Files
    • The User machine overrides must use ~umui/overrides/archer2_cce_7.3_machine
    • The User file overrides must use ~umui/overrides/archer2_cce_7.3_file

Post Processing has not been tested.

umshared

All n02 users will be granted read access to the umshared package account (as was the case on ARCHER.).

UM data and software is installed under /work/y07/shared/umshared.

You may set UMDIR in your ~/.bash_profile, but note, batch jobs cannot see /home and will not source scripts that reside there.


Gotchas

The ssh agent on PUMA/pumatest will need to be restarted from time to time. You will need to add your ~/.ssh/id_rsa_archerum key for job submission. We recommend that you attempt to login to ARCHER to address a possible issue with the know_hosts file thus:

grenville@pumatest.nerc.ac.uk:/home/grenville/roses/u-bc964/site$ ssh grenvill@login.archer2.ac.uk
Warning: the RSA host key for 'login.archer2.ac.uk' differs from the key for the IP address '193.62.216.2'
Offending key for IP in /home/grenville/.ssh/known_hosts:53
Matching host key in /home/grenville/.ssh/known_hosts:87

Deletion of the offending key should prevent problems with UM jobs - in the case above deletion of the entry at line 53.

Performance

Suite id Description UM Version Date Run Platform Decomposition* OpenMP Threads Length of run (days) Dump Frequency (days) Wallclock (h:m) Data output vol (GB) Comment Cost/model-yr
u-bz764 UKESM 11.2 11.12.20 ARCHER2 32x18x2:12x9:8 (9:1:1) 2(atm) 90 90 1:41 103 73.3 CU (135kAU)
u-be303-archer2 UKESM AMIP 11.1 14.12.20 ARCHER2 16x16x2 (4) 2 90 90 2:42 80 OOMs on 1 thread 43.2 CU (82.9kAU)
u-be303-archer2 UKESM AMIP 11.1 15.12.20 ARCHER2 32x18x2 (9) 2 90 90 1:32 80 54 CU (103kAU)

*(atm:ocean:xios (nodes))