wiki:Projects/MONSooN-XC40

Version 8 (modified by ros, 23 months ago) (diff)

MONSooN XC40

CMS are undertaking installation of the UM versions to the new Cray XC40. Not all UM versions currently available on the IBM MONSooN will be ported to the new machine. The versions being installed are 6.6.3 (as 6.6.6), 7.3, 8.2, 8.4, 8.5, 8.6, 9.x and 10.x.

For each UM version this involved (but was not limited to):

  • Setting up the $UMDIR/vnx.y directory on MONSooN
  • Installing the appropriate version of GCOM
  • Modifying the UMUI to add appropriate options to windows for submission to the XC40, as well as processing logic to produce the correct qsub submission scripts.
  • Adding the FCM configuration files
  • Testing a range of standard job configurations, NRUNs, CRUNs, checking for bit reproducibility, etc
  • Modifying the archiving scripts
  • Installing appropriate version of OASIS for coupled jobs

Details of the UM installations available on the XC40 and information on how to convert a UM job from the MONSooN IBM to run on the XC40 are available on the Collaboration Twiki: CrayUMInstall

[Effort: 2PM Install + 1PM Initial Support]


Test Jobs

UM Version Job Description Job/Suite Id EWxNS NRUN CRUN Archiving Bit comparison Performance Other Notes
6.6.6 HadGEM2-AO xlqja YES YES CRUNs b/c 2hrs for 30 day run; 1 node copy of our standard job run with 6.6.6 trunk code only, ie no branches at all - OK after L_TRACER_MASS fix - need to remove stash 3 293 (problem w/partial sum files)
6.6.6 HadGEM2-ES xlqjy 4x8 YES 30 day run 3hrs wallclock copy of our standard job run only needed to fix the hard-wired path in ukca_phot2d.F90 - also switched off reading binary data - need to fix path for data needed in UKCA_STRATF these are UMUI changes in the UKCA section only; include fcm:um_br/dev/grenville/hg6.6.6_UKCA-path-fix/src
6.6.6 HadGEM2 AMIP xlpoi 12x8 YES YES YES CRUNs b/c 00:08:14 10day run. dump freq: 1 day
00:18:21 30day run. dump freq: 10days
6.6.3 HadGEM2-AO xlqjz 1hr for 30 day run (1 node (32 cores) cf 1hr 16min for ARCHER equivalent (1 node (24 cores)) 6.6.3 job (our standard) run through the 6.6.6 umui; need to switch off stash Section 3 item 293 for climate meaning to work (building acumps at -O0 doesn't help); MO Cray configs used; need to remove L_TRACER_MASS from namelists
7.3 HadGEM3-A r2.0 xlqvb 4x8, 8x4, 8x8 YES YES YES (see xlrda) CRUNs b/c proc decomp no b/c same as archer monsoon: 284 sec/day 1 node, 150 sec/day 2 nodes archer: 350 sec/day 1 node (4x6) daily dumps For archiving use
fcm:um-br/dev/annette/vn7.3_HadGEM3-A_r2.0_monsoon_archiving_new
and ~annette/hadgem3/hand_edits/moose_archiving_new.ed
7.3 QESM-OA N48-ORCA2 xlrdb YES YES YES Atmos decomp no b/c
CRUNs b/c
8x8(um) + 4x4(nemo) 8x2(cice) [4 nodes] = 11m 28s
8x12(um) + 4x4(nemo) 8x2(cice) [5 nodes] = 8m 39s
Ported from an IBM MONSooN job as coupled 7.3 not run on Archer. Timing for 1 month run with 10 day atmos dumps and 1 month NEMO and CICE dumps.
For archiving use branch as above and hand-edit: ~annette/hadgem3/hand_edits/moose_archiving_new_cpl.ed
8.2 HG3A xlpoj 8x16 YES YES YES CRUNs b/c 00:43:18 30day run. dump freq: 1day Bindings: fcm:um-br/dev/um/vn8.2_machine_cfg with container@vn8.2_cfg
Scripts branch: fcm:um-br/pkg/Config/vn8.2_ncas
Archiving branch: fcm:um-br/dev/ros/vn8.2_MetoCray_arch (This can be merged into Jeff's branch once sys_info installed.)
8.2 HG3A GA4.0 AMIP OpenMP + Hyperthreads xlpoe 8x12 YES YES Does not bit compare 00:16:31 10day run. dump freq: 10days Bindings: fcm:um-br/dev/um/vn8.2_MetoCray with container@19474, Scripts branch: fcm:um-br/pkg/Config/vn8.2_ncas
8.4 HG3A GA4.0 OpenMP + Hyperthreads xlpof 8x12 YES YES CRUNs b/c 00:03:19 10day run. dump freq: 10days
00:16:36 10day run. dump freq 1day
Bindings: fcm:um-br/dev/um/vn8.4_machine_cfg with container@vn8.4_cfg, Scripts branch: fcm:um-br/pkg/Config/vn8.4_ncas
8.5 GA6.0 N96 (antia) OpenMP + Hyperthreads xlpoh 16x8 YES YES CRUNs b/c Bindings: fcm:um-br/dev/um/vn8.5_machine_cfg with container@vn8.5_cfg, Scripts branch: fcm:um-br/pkg/Config/vn8.5_ncas
8.6 GA6.0 N96 (No OpenMP) xlpod 16x8 YES YES YES CRUNs b/c 00:52:35 30day run. dump freq: 1day Bindings: fcm:um-br/dev/um/vn8.6_machine_cfg with container@vn8.6_cfg,
Scripts branch: fcm:um-br/pkg/Config/vn8.6_ncas
8.6 N96 AO GC2.0 (No OpenMP) xlsdd 8x8+16x16+1x8 YES YES Still needs fixes to cfg file
10.2 N48 Endgame GA6 u-aa345 N/A 1x4 v 4x8 9 hr dumps b/c 9 hr run (=27 ts) w. dumps every 3 hrs
00:02:31 1x4 shared
00:01:17 4x8 parallel
Port of Met Office XC40 standard job. Need to increase memory requirements for build and recon in shared queue.