Posts by author Grenville Lister

ARCHER2 23-cabinet UM available

CMS is happy to announce that the UM infrastructure available on the 4-cabinet pilot ARCHER2 system is now ready for use on the full 23-cabinet system. UM suites that ran on the 4-cabinet system will require only minor modifications:

  • all suites will need changes to setting up the job environment through slightly modified invocations of module load
  • coupled suites will need to use a new moci module
  • coupled suites need modifications to their slurm hetjob configuration

CMS have re-written the suites listed at to incorporate the features listed above - these suites should serve as examples that you can follow to port your suites to the full system. has been updated to reflect use of the full ARCHER2 system as it stands - further information will be added as the system beds in.

Please note: /work on the 4-cabinet system is distinct from /work on the full system; it is your responsibility to move data between the two.

ARCHER2 23-cabinet system

The ARCHER2 23-cabinet system is now available. It will run in parallel with the 4-cabinet system that has been in use for the past year. CMS are working to migrate the UM and associated software to the new system. Users should take this opportunity to move data to the new machine.

We shall announce when the UM is ready for use on the 23-cabinet system.


  • ARCHER will end on Jan 27th - please note, this means no access whatsoever subsequently
  • NEXCS will now continue to late June - this represents a five month extension of the service in light of delays with ARCHER2

Update: ARCHER return to service - UM work flow

We continue to work with ARCHER to implement a robust solution to handle the 2FA access. Until we have that solution, this is a short-term alternative. In your PUMA or pumatest .ssh/config file (create one if you don't already have one), delete references to and then add:

User <your ARCHER username>
IdentityFile ~/.ssh/<your private key to ARCHER>
ControlMaster auto
ControlPath /tmp/ssh-socket-%r@%h-%p
ControlPersist yes

Login to ARCHER (with passphrase and password).

Rose/Cylc suites should not use rose host-select - it will not work; suites must specify in the appropriate .rc file (suite.rc or archer.rc), for example, host =

Submit UMUI and Rose/Cylc jobs as usual.

The connection to ARCHER will persist after logging out of PUMA/pumatest and will enable the Cylc engine to manage your suite. However, the connection will be terminated (each day at ~5AM in our experience) - logging in to ARCHER will re-establish the connection and Rose/Cylc suites will pick up automatically. If a suite times out (over a weekend possibly), a normal rose suite-restart should get it going again.

ARCHER return to service - UM work flow

When ARCHER returns to operation on May 21st all users will be required to use two credentials to access the service: an SSH key with a passphrase and their ARCHER password.

Rose/Cylc suites and UMUI jobs will not run under this HPC access model. We are working closely with ARCHER to develop and implement a solution that will enable UM work flows to function with the new security scheme.

GA7 Optimizations

Work undertaken to optimise Pier Luigi Vidale's N512 GA7 runs by NCAS-CMS has resulted in a 15-20% speed up of the model with no loss of bit-comparison. The savings in ARCHER resource resulting from this effort are very significant - at ARCHER partner rate this is ~£60k, and twice that at the non-partner rate. Another way of looking at this is that we save ~100M AU (for these experiments) which enables a raft of ARCHER projects which may struggle otherwise.

Details follow:

In the standard GA7, stochastic physics is turned on. This holds a field in spectral space distributed over the PEs. Every timestep this field is gathered by PE0, converted to grid space and then distributed over the domain. This requires a gather and many scatters at everytime timestep. A month long N512 high resolution job was examined with drhoook and it was noted that FOR_PATTERN, the routine which does the gather/scatter and spectral→grid space transformation takes a significant proportion of the run time, 513s for a 3000s run.

FOR_PATTERN has been rewritten to remove all gathers and scatters of the spectral field. Instead of having the spectral rows distributed over all PEs, each PE holds the spectral rows equivalent to its own rows in grid space. It then does a local fourier transform to get back to grid space, and then extracts the its own longitude domain from the resultant field.

This requires extra compute as every PE over a latitude band does the same spectral→grid space transformation, but the savings communication times can far outweigh this. Gather/scatters when running on a large number (1000s) of PEs are best avoided as these can take many milliseconds to perform.

With it new code FOR_PATTERN takes 23s, compared with 513s.

Two test jobs have been run. A high resolution N512 and an AMIP N96 configuration. For both there is full bit compariability with the previous version.

The N512 GA7 is now being run on Archer with this branch. The speed increase, from Pier Luigi Vidale:

So, for two domain decompositions and for a 2-month dump at N512: 48x48: ~7 hours, but a few times as short as 6hrs40mins (down from ~8 hours) 48x72: 4hrs50m to 5hrs02m (down from ~6hours)

So a ~15%-20% speed up

For the AMIP GA7 runs, there was no discernable speed change, but this isn't surprising as the gather/scatter would be much faster whilst running on a low number of PEs

UM Training April 2016

We recently delivered the second of our bi-annual 3-day UM Training courses. This time we hosted 16 attendees from a wide range institutions from across the UK. Three days of hard work running and breaking UM jobs on ARCHER with specialist presentations from MO, and NCAS experts. This was our final UMUI-centric training course, next time Rose/Cylc will be the main focus. Many thanks to those attending and presenting.

Course details and presentations are available here.

The course photo sees us hard at work:

UM Training December 2015

December saw the successful delivery of one of our bi-annual 3-day UM Training courses. This time we hosted 25 attendees from a wide range institutions from across the UK. Three days of hard work running and breaking UM jobs on ARCHER with specialist presentations from EPCC, MO, and NCAS experts left us all ready for the Christmas break. Many thanks to those attending and presenting.

Course details and presentations are available here.

Those of us feeling photogenic appear in the course photo:

UM Training attendees

Performance analysis and Optimisation of the Met Unified Model on a Cray XC30

Here we present results from the optimisation work carried out by the UK National Centre for Atmospheric Science (NCAS) for a high resolution configuration (N512) on the UK ARCHER supercomputer, a Cray XC-30. On ARCHER, we use Cray Performance Analysis Tools (CrayPAT) to analyse the performance of UM and then Cray Reveal to identify and parallelise serial loops using OpenMP directives. We compare performance of the optimised version at a range of scales, and with a range of optimisations, including altered MPI rank placement, and addition of OpenMP directives.

Article available at

[resolved] Network problems between Reading and ARCHER

The JANET engineers have identified an network issue and put a fix in place. Please let us know if you see any further network stalling between puma and Archer.