Opened 9 months ago

Closed 9 months ago

#2603 closed help (fixed)

Errors porting job from XCS-C to ARCHER

Reported by: luke Owned by: um_support
Component: UM Model Keywords: STASH,mpich
Cc: Platform: ARCHER
UM Version: 11.0

Description

Hello,

I'm having problems porting a working XCS-C vn11.0 GA7.1+StratTrop suite (u-ba915) to ARCHER. The recon job.err file is:

--------------------------------------------------------------------------------
This is a private computing facility. Access to this service is limited to those
who have been granted access by the operating service provider on behalf of the
contracting authority and use is restricted to the purposes for which access was
granted. All access and usage are governed by the terms and conditions of access
agreed to by all registered users and are thus subject to the provisions of the
Computer Misuse Act, 1990 under which unauthorised use is a criminal offence.

If you are not authorised to use this service you must disconnect immediately.
--------------------------------------------------------------------------------

cray-mpich/7.5.5(34):ERROR:150: Module 'cray-mpich/7.5.5' conflicts with the currently loaded module(s) 'cray-mpich/7.2.6'
cray-mpich/7.5.5(34):ERROR:102: Tcl command execution failed: conflict cray-mpich

cray-mpich/7.5.5(34):ERROR:150: Module 'cray-mpich/7.5.5' conflicts with the currently loaded module(s) 'cray-mpich/7.2.6'
cray-mpich/7.5.5(34):ERROR:102: Tcl command execution failed: conflict cray-mpich

[WARN] file:STASHC: skip missing optional source: namelist:exclude_package(:)
[WARN] file:RECONA: skip missing optional source: namelist:trans(:)
[WARN] file:IDEALISE: skip missing optional source: namelist:idealised
[FAIL] file:STASHmaster=source=fcm:um.xm_br/dev/mohitdalvi/vn11.0_ukca_ageair_and_stashm/rose-meta/um-atmos/HEAD/etc/stash/STASHmaster@51334: bad or missing value
Received signal ERR
cray-mpich/7.5.5(34):ERROR:150: Module 'cray-mpich/7.5.5' conflicts with the currently loaded module(s) 'cray-mpich/7.2.6'
cray-mpich/7.5.5(34):ERROR:102: Tcl command execution failed: conflict cray-mpich

cylc (scheduler - 2018-09-06T13:44:00Z): CRITICAL Task job script received signal ERR at 2018-09-06T13:44:00Z
cylc (scheduler - 2018-09-06T13:44:00Z): CRITICAL failed at 2018-09-06T13:44:00Z

There seem to be two things labelled as errors:

  1. cray-mpich/7.5.5(34):ERROR:150: Module 'cray-mpich/7.5.5' conflicts with the currently loaded module(s) 'cray-mpich/7.2.6'

There are tickets with a similar error mentioned (#2477, #2478, and #2479), although I believe from the context that this is not actually important (Ros said "The mpich errors here can be ignored" in #2479).

This leaves

  1. [FAIL] file:STASHmaster=source=fcm:um.xm_br/dev/mohitdalvi/vn11.0_ukca_ageair_and_stashm/rose-meta/um-atmos/HEAD/etc/stash/STASHmaster@51334: bad or missing value

A similar error was also reported in #2156 and #2158 on Monsoon2. The version of Cylc being used here is 6.11.4 - is this correct for ARCHER? Could it be an issue with the MOM nodes?

This definition of the user-defined STASHmaster file works on Monsoon2 without problems. I also tried moving this from the app/um/rose-app.conf file to the rose-suite.conf file as per

http://www.ukca.ac.uk/wiki/index.php/UKCA_Chemistry_and_Aerosol_vn10.9_Tutorial_4#Use_your_new_STASHmaster_file_in_Rose

which did work on ARCHER previously, and this still resulted in the same error.

I've tried diffing this suite with both u-av674 and Ros' u-ax053 and the ARCHER-specific files are identical, although there are other differences in the suite.rc file that do not mention MPICH, STASH, or Cylc..

Any and all advice as to how to proceed further would be greatly appreciated.

Many thanks,
Luke

Change History (2)

comment:1 Changed 9 months ago by ros

Hi Luke,

I confirm the mpich error can be ignored.

The STASHmaster file must be extracted from within the rose-suite.conf (i.e. on PUMA) as the repositories cannot be seen from ARCHER. So you definitely need to move that out of the app/um/rose-app.conf file.

In rose-suite.conf add:

[file:app/um/file/STASHmaster]
source=fcm:um.xm_br/dev/mohitdalvi/vn11.0_ukca_ageair_and_stashm/rose-meta/um-atmos/HEAD/etc/stash/STASHmaster@51334

Then remove the whole [file:STASHmaster] section from app/um/rose-app.conf

Cheers,
Ros.

comment:2 Changed 9 months ago by luke

  • Resolution set to fixed
  • Status changed from new to closed

Hi Ros,

Success!

Many thanks. I must have messed-up in some way when I tried this before. It's good to know that defining things in app/um/rose-app.conf is incorrect for ARCHER as that is generally how I've seen Monsoon2 suites be configured from the Met Office. I'll request that rose-suite.conf is used instead.

I'm now getting different errors, but I was expecting these ones!

Many thanks and best wishes,
Luke

Note: See TracTickets for help on using tickets.