wiki:Docs/Polaris

Version 27 (modified by ros, 21 months ago) (diff)

Polaris

Polaris, one of the N8 HPCs, is an SGI HPC cluster with a total of 332 compute nodes.

CMS have installed several UM versions on Polaris. Details of how to setup and run on Polaris can be found below:

  • UM 6.6.3
  • UM7.3
  • UM 8.4

Setting up UM 6.6.3 to run on Polaris

1. Central Installation Directory Structure

The central UM directory is /home/polaris_lds1/earhum. FCM (code management system), gcom (UM communications software), and UM vn 6.6.3 are held in this directory. Other versions of the UM and related software will also be installed here.

The central installation of UM vn 6.6.3 is in /home/polaris_lds1/earhum/hg6.6.3

DirectoryDescription of Contents
HG2ES_ancilsAncillary files for the HadGEM2-ES model
HG2CCL60_ancilsAncillary files for the HadGEM2-CC L60 model and example start files
HG2AO_ancilsAncillary files for the HadGEM2 coupled Atmosphere-Ocean model and example start files
HG2AMIP_ancilsAncillary files for the HadGem2 AMIP model and an example start file
dumpsexample start dumps for the HadGEM2-ES model
ctldataSTASHmaster, ANCILmaster, spectral data, and vertical-levels files
sgismall executables and UM utilities, and installation information

2. Environment Variables and Your Files

Standard UM environment variables are set thus

Environment VariableValue
UMDIR /home/polaris_lds1/earhum
WORKDIR /nobackup/$USER
DEVTDIR $WORKDIR
DATADIR$WORKDIR
TMPDIR$WORKDIR/tmp/tmp.polaris.$$

so dumps, diagnostics, lbcs, history files, and intermediate output will by default be sent to /nobackup/$USER/$RUNID. The final leave file for a run will be saved in /home/polaris_lds1/$USER/output.

Start files. Example start files for the four HadGEM2 jobs are provided. We have mimicked the file structure as it exists on HECToR, so start files for HadGEM2-ES are in $UMDIR/hg6.6.3/dumps, all other models are provided with example start files along with their ancillary files.

3. Your Setup on Polaris

Copy the UM Setup section from /home/polaris_lds1/ldsgl/.bashrc to your .bashrc (note, if you wish to use utilities for a version of the UM other than 6.6.3, you will need to source the appropriate .umsetvars file)

4. Model Build Information and UMUI Settings

The model build information is kept in configuration files which are read by the FCM build system. The appropriate compiler flags are used to build the model along with information about which libraries to link. We have used the following flags for the Polaris intel compiler for both model and communications builds

-i8 -r8 -fp-model precise -O1

To ensure that these and all other Polaris build settings are picked up by FCM, navigate to model selection → sub-model independent → FCM configuration → FCM configuration variables and set the variable UM_SVN_BIND to

fcm:um_br/dev/ros/hg6.6.3_polaris_machine_cfg/src/configs/bindings

FCM Branches

We have created a branch with code changes to account for slight differences in the way the intel compiler handles some aspects of the UM code (mostly because of its intolerance to multiple declarations of the same variable). Navigate to model selection → sub-model independent → FCM configuration → FCM configuration optional modifications and include the following branch

fcm:um_br/dev/grenville/hg6.6.3_polaris_fixes

in the User Modifications table

General UMUI Settings

Navigate to model selection → sub-model independent → FCM configuration → FCM configuration variables. Experience will help determine the most convenient places where the extracted model should reside. Explicitly set UM_ROUTDIR to be your user directory on /nobackup, however, note that files left in this directory may be marked for deletion in accordance with Polaris data policy.

Navigate to model selection → user information and target machine → general details Set User-id to be your Polaris id. The Tic Code is not relevant to Polaris. We have not tested end of run email notification.

Navigate to model selection → user information and target machine → target machine Chose 'other' for the Compile, Link and Run the job option, and set the Other machine name to polaris.leeds.ac.uk

Navigate to model selection → sub-model independent → job submission, resources and resubmission pattern Choose the option 'qsub' for SGE(SGI) for the submission method. The Job memory limit is not relevant (all jobs currently request the default memory/core of 4GB).

Gotchas for HadGEM2-AMIP:
At two places in the UMUI, files need to be specified by full paths: navigate to model selection → atmosphere → ancillary and input data files → climatologies and potential climatologies → natural climate forcing and specify /home/polaris_lds1/earhum/hg6.6.3/HG2AMIP_ancils as the path for the solar forcing and volcanic forcing files.

We have undertaken limited testing of Climate Meaning to find that inclusion of STASH item 262 (section 0) BOUNDARY LAYER CONVECTION FLAG causes a checksum failure. Switch off this stash item when running with Climate Meaning.

5. Example UMUI Jobs

The following jobs are currently under user grenville in the UMUI

Job Id Model
xideeHadGEM2-ES
xidexHadGEM2-AMIP
xideyHadGEM2-CC L60
xidezHadGEM2-AO

6. Performance and Scaling

The following figure summarizes results from test running the four HadGEM2 jobs with increasing processor count. The jobs were run for 3 model days, with the exception of HadGEM2-CC was run for 1 model day. The results for total wall clock time for HadGEM2-ES running on HECToR and CURIE (a PRACE machine (Bull)) are also shown. Data points in red refer to the compute time for the Polaris jobs. All models display a lack of scaling when run on 96 processors.

Summary of HadGEM2 job Polaris scaling


Setting up UM 7.3 to run on Polaris

1. Central Installation Directory Structure

The vn 7.3 installation is under /home/polaris_lds1/earhum/vn7.3. The subdirectory HGPKG2 holds all the data files necessary to run the standard HadGEM3 job xidef (the same job on HECToR is xeozb). The various other subdirectorties hold control data for the model, and data to run xideh (a copy of xfvgc).

2. Your Setup on Polaris

Copy the UM Setup section from /home/polaris_lds1/ldsgl/.bashrc to your .bashrc (note, if you wish to use utilities for a version of the UM other than 6.6.3, you will need to source the appropriate .umsetvars file)

3. Model Build Information and UMUI Settings

The model build information is kept in configuration files which are read by the FCM build system. The appropriate compiler flags are used to build the model along with information about which libraries to link. We have used the following flags for the Polaris intel compiler for both model and communications builds

-i8 -r8 -fp-model precise -O1

UM vn 7.3 requires the inclusion of override files to correctly set compiler and linker options. Navigate to model selection→ compile and modifications→ UM user override files and include /home/grenville/umui_jobs/overrides/polaris_7.3_machine and /home/grenville/umui_jobs/overrides/polaris_7.3_file in the tables User machine overrides and User file overrides respectively.

FCM Branches

Navigate to model selection → sub-model independent → FCM configuration → FCM configuration optional modifications and include the following branch

fcm:um_br/dev/grenville/vn7.3_polaris_ukca/src

in the User Modifications table. The naming of this necessary branch may change in future - this branch contains minor code changes to point to appropriate UKCA data files on Polaris and a script modification to ensure CRUNS behave correctly but was based on Luke's UKCA-CheM branch, hence make sure that the following branch does not appear in the list otherwise a build error will result:

fcm:um_br/dev/lukeVN7.3_UKCA_CheM/src.

General UMUI Settings

Navigate to model selection → sub-model independent → FCM configuration → FCM configuration variables. Experience will help determine the most convenient places where the extracted model should reside. Explicitly set UM_ROUTDIR to be your user directory on /nobackup, however, note that files left in this directory may be marked for deletion in accordance with Polaris data policy.

Navigate to model selection → user information and target machine → general details Set User-id to be your Polaris id. The Tic Code is not relevant to Polaris. We have not tested end of run email notification.

Navigate to model selection → user information and target machine → target machine Chose 'other' for the Compile, Link and Run the job option, and set the Other machine name to polaris.leeds.ac.uk

Navigate to model selection → input/output control and resources → job submission, resources and resubmission pattern Choose the option 'qsub' for SGE(SGI) for the submission method. The Job memory limit is not relevant (all jobs currently request the default memory/core of 4GB).

Navigate to model selection→ compile and modifications→ UM user override files and include /home/grenville/umui_jobs/overrides/polaris_7.3_machine and /home/grenville/umui_jobs/overrides/polaris_7.3_file in the tables User machine overrides and User file overrides respectively.

Gotchas for UKCA @ vn 7.3
There are several places in the code where hard wired absolute paths are used, namely, in the routines ukca_read_aerosol.F90 and ukca_phot2d.F90. We have merely patched the code to point to files on Polaris.

4. Example UMUI Jobs

The following jobs are currently under user grenville in the UMUI

Job Id Model
xidefHadGEM3-A r2.0 N96 L85
xidehUKCA-StratChem N48 L60 QESM-A

Setting up UM 8.4 to run on Polaris

1. Central Installation Directory Structure

The vn 8.4 installation is under /home/polaris_lds1/earhum/vn8.4.

2. Your Setup on Polaris

Copy the UM Setup section from /home/polaris_lds1/ldsgl/.bashrc to your .bashrc. In order to use vn 8.4 utilities make sure you source .umsetvars_8.4.

3. Model Build Information and UMUI Settings

The model build information is kept in configuration files which are read by the FCM build system. The appropriate compiler flags are used to build the model along with information about which libraries to link. We have used the following flags for the Polaris intel compiler for both model and communications builds

-i8 -r8 -fp-model precise -O1 -openmp

FCM Branches

Navigate to model selection → FCM configuration → FCM options for atmosphere and reconfiguration.

Until we introduce keywords, specify revision number 11979 for the UM code base and include the following branch

fcm:um_br/dev/grenville/vn8.4_polaris/src

in the User Modifications table. The naming of this necessary branch may change in future - this branch contains a script change needed to ensure CRUNS behave correctly.

General UMUI Settings

Navigate to model selection → FCM configuration → FCM extract directories and output levels . Experience will help determine the most convenient places where the extracted model should reside. Explicitly set UM_ROUTDIR to be your user directory on /nobackup, however, note that files left in this directory may be marked for deletion in accordance with Polaris data policy.

Navigate to model selection → user information and submit method → general details Set User-id to be your Polaris id. The Tic Code is not relevant to Polaris. We have not tested end of run email notification.

Navigate to model selection → user information and target machine → job submission method Choose the option 'qsub' for SGE(SGI) for the submission method, enter polaris.leeds.ac.uk for the Host name, select Change machine config file and enter sgi-intel-polaris.

Navigate to model selection → input/output control and resources → user hand edit files and include /home/grenville/umui_jobs/hand_edits/polaris_8.4.1 in the Hand edits list. This is currently necessary to mange the way Polaris handles the module command in the k shell and selects the mvapich2 module to ensure that IO servers function correctly. Future work should see the need for this to disappear.

4 Example UMUI Job

The following job is currently under user grenville in the UMUI

Job Id Model
xidemHadGEM3-GA4.0 Polaris

This job uses the UM I/O servers which require that the model run with at least two OpenMP threads. We have successfully run the model using mvapich2 as the mpi implementation which supports multi-threaded mpi. The model runs with single threading with both openmpi and intemlpi mpi implemantations - in these cases OpenMP code directives are obeyed but the model can not support I/O servers.

Attachments (1)

Download all attachments as: .zip