Posts by author Grenville Lister

GA7 Optimizations

Work undertaken to optimise Pier Luigi Vidale's N512 GA7 runs by NCAS-CMS has resulted in a 15-20% speed up of the model with no loss of bit-comparison. The savings in ARCHER resource resulting from this effort are very significant - at ARCHER partner rate this is ~£60k, and twice that at the non-partner rate. Another way of looking at this is that we save ~100M AU (for these experiments) which enables a raft of ARCHER projects which may struggle otherwise.

Details follow:

In the standard GA7, stochastic physics is turned on. This holds a field in spectral space distributed over the PEs. Every timestep this field is gathered by PE0, converted to grid space and then distributed over the domain. This requires a gather and many scatters at everytime timestep. A month long N512 high resolution job was examined with drhoook and it was noted that FOR_PATTERN, the routine which does the gather/scatter and spectral→grid space transformation takes a significant proportion of the run time, 513s for a 3000s run.

FOR_PATTERN has been rewritten to remove all gathers and scatters of the spectral field. Instead of having the spectral rows distributed over all PEs, each PE holds the spectral rows equivalent to its own rows in grid space. It then does a local fourier transform to get back to grid space, and then extracts the its own longitude domain from the resultant field.

This requires extra compute as every PE over a latitude band does the same spectral→grid space transformation, but the savings communication times can far outweigh this. Gather/scatters when running on a large number (1000s) of PEs are best avoided as these can take many milliseconds to perform.

With it new code FOR_PATTERN takes 23s, compared with 513s.

Two test jobs have been run. A high resolution N512 and an AMIP N96 configuration. For both there is full bit compariability with the previous version.

The N512 GA7 is now being run on Archer with this branch. The speed increase, from Pier Luigi Vidale:

So, for two domain decompositions and for a 2-month dump at N512: 48x48: ~7 hours, but a few times as short as 6hrs40mins (down from ~8 hours) 48x72: 4hrs50m to 5hrs02m (down from ~6hours)

So a ~15%-20% speed up

For the AMIP GA7 runs, there was no discernable speed change, but this isn't surprising as the gather/scatter would be much faster whilst running on a low number of PEs

UM Training April 2016

We recently delivered the second of our bi-annual 3-day UM Training courses. This time we hosted 16 attendees from a wide range institutions from across the UK. Three days of hard work running and breaking UM jobs on ARCHER with specialist presentations from MO, and NCAS experts. This was our final UMUI-centric training course, next time Rose/Cylc will be the main focus. Many thanks to those attending and presenting.

Course details and presentations are available here.

The course photo sees us hard at work:

UM Training December 2015

December saw the successful delivery of one of our bi-annual 3-day UM Training courses. This time we hosted 25 attendees from a wide range institutions from across the UK. Three days of hard work running and breaking UM jobs on ARCHER with specialist presentations from EPCC, MO, and NCAS experts left us all ready for the Christmas break. Many thanks to those attending and presenting.

Course details and presentations are available here.

Those of us feeling photogenic appear in the course photo:

UM Training attendees

Performance analysis and Optimisation of the Met Unified Model on a Cray XC30

Here we present results from the optimisation work carried out by the UK National Centre for Atmospheric Science (NCAS) for a high resolution configuration (N512) on the UK ARCHER supercomputer, a Cray XC-30. On ARCHER, we use Cray Performance Analysis Tools (CrayPAT) to analyse the performance of UM and then Cray Reveal to identify and parallelise serial loops using OpenMP directives. We compare performance of the optimised version at a range of scales, and with a range of optimisations, including altered MPI rank placement, and addition of OpenMP directives.

Article available at

http://arxiv.org/abs/1511.03885

[resolved] Network problems between Reading and ARCHER

The JANET engineers have identified an network issue and put a fix in place. Please let us know if you see any further network stalling between puma and Archer.