Opened 3 years ago

Closed 3 years ago

#2156 closed help (fixed)

STASHmaster problems?

Reported by: mattjbr123 Owned by: ros
Component: UM Model Keywords:
Cc: Platform: Monsoon2
UM Version: 10.3

Description

Hi,

Getting the following error when running recon or atmos_main tasks in suite u-af404 on MONSooN:

[FAIL] file:STASHmaster=source=fcm:um.xm_br/dev/colinjohnson/vn10.3.1_radaer_intf2/rose-meta/um-atmos/HEAD/etc/stash/STASHmaster@15625: bad or missing value

There are no pe_output files appearing which doesn't help.

The strange thing is, it was working yesterday, and I haven't touched anything to do with STASH or the STASHmaster files…

I have so far tried the following:

  • Changing the STASHmaster source to the a newer version of the one originally producing the error above, the vn10.3 file from the UM trunk, and the one from the HEAD of the UM trunk. All gave the same error.
  • Running
    rose suite-run --new
    
    for the original and UM trunk STASHmaster files. This time the same error came up but for the recon task.
  • Using different ENSMEMBER numbers, in case that was for some reason affecting anything, as I had changed them before running the suite today. No luck here either though, the error remained the same.

Other information that might be useful:
Yesterday I originally couldn't get any suites to run because of a cylc error - not recognising the pyro communication method which resulted from my outdated global.rc file when cylc was updated recently. After commenting out the line in ~mabro/.cylc/global.rc: task communication method = pyro
under [hosts]localhost? and [hosts]xc*? there was no longer a problem. I was later advised by AJ Watling who had spoken to the developers that there was no longer any need for anything in the ~mabro/.cylc/global.rc file as it should now work 'out of the box' without the need for additional user settings, and that therefore I could remove everything in the file.
Reinstating the file (with the pyro line still commented out) has not fixed the errors.

Any ideas greatly appreciated, probably something simple…

Cheers,
Matt

Change History (10)

comment:1 Changed 3 years ago by mattjbr123

Have now also tried with a different suite - u-ak617, which is now also suffering from the same error. Confuzzled much.

comment:2 Changed 3 years ago by ros

  • Owner changed from um_support to ros
  • Status changed from new to accepted

Hi Matt,

Have you followed the advice recently posted on the Monsoon Yammer newsgroup? I haven't looked at both suites but u-ak617 is definitely running with cylc 6.11.4 so likely to have problems.

In general, if you have a suite running under cylc 6.X, you will need to ensure you talk to it using a cylc-6 client. E.g. to stop a suite running at cylc-6.11.4:

CYLC_VERSION=6.11.4 cylc stop SUITE
# or
CYLC_VERSION=cylc6 cylc stop SUITE

Unless there is a strong requirement, you should start new suites with the new site default versions only.

To upgrade a suite running at cylc-6.11.4, etc to cylc-7.3.0, etc:

0. Ensure that your suite is cylc 7 ready. 
E.g. run "CYLC_VERSION=7.3.0 cylc validate ~/cylc-run/SUITE/suite.rc.processed".

1. Stop the suite with "CYLC_VERSION=6.11.4 cylc stop --now SUITE".

2. Wait for the suite to stop cleanly.

3. Edit "~/cylc-run/SUITE/log/rose-suite-run.conf", modify "CYLC_VERSION" and "ROSE_VERSION" under the "[env]" section.

[env]
# ...
CYLC_VERSION=7.3.0
ROSE_VERSION=2017.02.0

4. Run "rose suite-restart --name=SUITE".

Cheers,
Ros.

comment:3 Changed 3 years ago by ros

P.S. I confirm you don't need a ~/.cylc/global.rc file. I notice you have lots of stuff in there and it is liable to cause problems when the system versions are upgraded. Please move it out of the way.

comment:4 Changed 3 years ago by mattjbr123

I'm afraid I'm unable to access the Yammer group at the moment (something to with ox.ac.uk email address…), would you be able to post any pertinent advice below?

I will try your suggestions above.

Should I be upgrading my suites every time there is a cylc upgrade?

Thanks,
Matt

Last edited 3 years ago by mattjbr123 (previous) (diff)

comment:5 Changed 3 years ago by ros

  • Status changed from accepted to pending

Hi Matt,

If there is a cylc upgrade that requires user action, an announcement should be made on the newsgroups. The upgrade from cylc-6.x to cylc-7.x was a major one hence the advice posted above. The information above was from the Yammer group. I've not seen anything else relevant posted.

I've tried running one of my suites on xcs too and am also getting the problem accessing the STASHmaster file. I'm in conversation with the Monsoon team and will get back to you as soon as I have any news or further suggestions to try out.

Cheers,
Ros.

comment:6 Changed 3 years ago by mattjbr123

Ok great, thanks.

Matt

comment:7 Changed 3 years ago by ggxmy

I'm having exactly the same problem. I raised a ticket #2158 for this.

Masaru

comment:8 Changed 3 years ago by ros

Hi Matt,

The Met Office have now resolved the problem with the MOM nodes. Please try submitting your suite again.

Regards,
Ros.

comment:9 Changed 3 years ago by mattjbr123

Seems to be working now - the recon task has finished successfully - thanks for sorting!

Matt

comment:10 Changed 3 years ago by ros

  • Resolution set to fixed
  • Status changed from pending to closed
Note: See TracTickets for help on using tickets.