Opened 4 months ago

Closed 3 months ago

#2750 closed help (fixed)

STASHmaster addition: Error "No stashmaster record".

Reported by: cbellisario Owned by: um_support
Component: UM Model Keywords: stashmaster, um
Cc: Platform: NEXCS
UM Version: 11.0

Description

Dear NCAS team,

I am struggling with the addition of a STASHmaster item in suite be515.

I did perform the addition in STASTHmaster_A, STASHmaster-meta.conf.
I did relied the address in:

  • um/ meta
  • um/env/Runtime Controls (add latent variable)
  • um/namelist/Reconfiguration and Ancillary controls/Configure ancils and initialise dump fields (with the addition of the new section)
  • um/namelist/Model Input and Output/STASH Requests and Profiles/Stash? requests (with the addition of the stash request and the run of the corresponding macros transform/tidy/validate)

And yet, I get the following error message

????????????????????????????????????????????????????????????????????????????????
?????????????????????????????? WARNING ??????????????????????????????
? Warning code: -10
? Warning from routine: PRELIM
? Warning message:
? Field - Section:0, Item:512 discarded.
? No stashmaster record.
? Warning from processor: 0
? Warning number: 24
????????????????????????????????????????????????????????????????????????????????

I have checked that the jlw_down_band (new index added to read the variable lw_down_band) does not get "good" values as it starts at 0.
So I think I miss one point (or more) in the configuration in Rosie go, but I cannot figure out which one.
I did similar things in suite bd497 and I have index starting at 40000 something, so I tried to do the same but without success.

Any help is more that welcome,

Best regards,

Christophe

Change History (15)

comment:1 Changed 4 months ago by grenville

Christophe

In STASH requests, you request section 0, item 512 — but there is no such entry in the STASHMaster file - there are entries for sections 1,2,3,and 38 only

1| 1 | 1 | 512 |CLEAR DOWN SW FLUX ON LEVS AND BANDS|
1| 1 | 2 | 512 |CLEAR DOWN LW FLUX ON LEVS AND BANDS|
1| 1 | 3 | 512 |STABILITY FUNCTION FOR MOMENTUM |
1| 1 | 38 | 512 |H2O Aitken-sol mode (kgm-3) |

Look like you have mistypes some metadata.

Grenville

comment:2 Changed 4 months ago by cbellisario

Dear Grenville,

Thanks for your help,

So I guess there is one part where I do not refer to the good STASHmaster file.
In STASHmaster_A located in ~/GA7.1_UM11.0_AMIP/Branch_Seq03_check/vn11.0_check/rose-meta/um-atmos/vn11.0_HEAD/etc/stash/STASHmaster/, I do have the line:
1| 1 | 0 | 512 |SURFACE DOWNWARD LW RADIATIONbdsW/M2|

And I do refer to this STASHMaster directory in rosie go & um/meta:
/home/d04/chrbe/GA7.1_UM11.0_AMIP/Branch_Seq03_check/vn11.0_check/rose-meta/um-atmos/vn11.0_HEAD/

And I could find it in the STASH requests in rosie go.

Is there another place where I have to redirect the STASHMaster folder to the good one?

Thank you for your help,

Best regards,

Christophe

comment:3 Changed 4 months ago by grenville

But your suite points to:

/home/d04/chrbe/GA7.1_UM11.0_AMIP/Branch_Seq03_check/vn11.0_check/rose-meta/um-atmos/HEAD/etc/stash/STASHmaster

Grenville

comment:4 Changed 4 months ago by cbellisario

Dear Grenville,

Thank you for your help, I trace back the error in the um/env/Runtime Controls link.

However, I still get trouble in the connexion to NEXCS, or when Rosie Go is launched, I get sometimes crashes due to memory issues. Today is about the connexion on NEXCS that takes a really long time before stopping. Yesterday was about cylc runs not visible but running on. "Connect Now" was not working either. Could it be due to a configuration of my suite that somehow crashed the memory allocation of my runs? Or is it only related to NEXCS?

Thank you in advance,

Best regards,

Christophe

comment:5 Changed 4 months ago by grenville

Christophe

I don't know what you mean by connection to NEXCS - from where? Where are you running Rosie go? Have you moved away from running from exvmsrose?

Grenville

comment:6 Changed 4 months ago by cbellisario

Dear Grenville,

Sorry for not being very clear:

  • To run on NEXCS, I connect on Puma, from which I connect on exvmslander and from which I connect on exvmsrose. From Puma to exvmslander, I don't have any problem, but from exvmslander to exvmsrose, today, I cannot access to exvmsrose.
  • A problem I had yesterday was: on exvmsrose, I launched rosie go &. However, when opened, the cylc button was a bit different. When launched the suite directly from the suite list, the cylc appeared but with nothing in it (blank) despite that the suite was told to be running.
  • A problem I had the days before was on exvmsrose, opening the suite with rosie go &, I had the suite crashing (about once every two times) when I tried to run it. I could though ran it directly from the suite list without any problem.

Now I am wondering if it is about temporary hpc ressources issues or if they are related.

Christophe

comment:7 Changed 4 months ago by grenville

Christophe

The xcs is down today:

"Here is a reminder for tomorrow's extended Monsoon and NEXCS outage, starting" from 04:00 on 6th February through to 11:00 on 7th February 2019, details on Yammer."

contact Monsoon to arrange access to the Yammer group.

It is no longer necessary to use exvmsrose - see https://collab.metoffice.gov.uk/twiki/bin/view/Support/MONSooN. The new system is much faster and hopefully the answer to your problems.

Grenville

comment:8 Changed 4 months ago by cbellisario

Dear Grenville,

Following the end of the exvmsrose era and the moving to xcslc0 / xcslc1, I now face a -new- problem:
When trying to get access to xcslc* from exvmslander, I get the following error:

[chrbe@exvmslander:~]$ ssh -Y xcslc1
Last login: Thu Feb  7 15:53:18 2019 from 10.168.5.6

    This computer is provided for the processing of Official Information.
    Unauthorized access may constitute a criminal offence. All activity
    on the system is liable to monitoring.


-bash: mosrs-cache-password: command not found
Met Office Science Repository Service password:
gpg-preset-passphrase: problem with the agent
gpg-preset-passphrase: caching passphrase failed: Invalid response
gpg-preset-passphrase: problem with the agent
gpg-preset-passphrase: caching passphrase failed: Invalid response
svn: E215004: Authentication failed and interactive prompting is disabled; see the --force-interactive option
svn: E215004: Unable to connect to a repository at URL 'https://code.metoffice.gov.uk/svn/test'
svn: E215004: No more credentials or we tried too many times.
Authentication failed
Error: Unable to access Subversion with given password
Run "mosrs-cache-password" to try caching your password again
Met Office Science Repository Service password:

I do get access at some points to xcslc1 but without being able to run anything.
I followed the https://collab.metoffice.gov.uk/twiki/bin/view/Support/RetirementOfRoseCylcVMs and associated https://code.metoffice.gov.uk/trac/home/wiki/AuthenticationCaching#Monsoon .
I changed the directory of mosrs-cache-password from

#!/bin/bash
set -u
gpgpresetpassphrase="/usr/libexec/gpg-preset-passphrase"

to

#!/bin/bash
set -u
gpgpresetpassphrase="/usr/lib64/gpg-preset-passphrase"

It does work either.
When trying to run mosrs-cache-password, it logs me out of xcslc1.
No need to let you know that rosie is of course not running when I am still on xcslc1.

On the other side, I can get back to exvmsrose as I use to before.
But rosie go on it still behaves strangely (related to the same troubles as expressed on http://cms.ncas.ac.uk/ticket/2758)
So I still cannot run/see/do anything on this side.

It starts to become as annoying as depressing in the sens that wherever I try to run the UM, nothing works (and I am not even advanced with the problem of the STASHmaster changes that does crash the UM at some point).

So any ideas about how to solve these problems are more than welcome.
I did contacted Monsoon team about the first part of the problem, I will post it here when they answer, that could help some other people.

Best regards,

Christophe

comment:9 Changed 4 months ago by grenville

Christophe

Please delete /home/d04/chrbe/mosrs-cache-password and /home/d04/chrbe/mosrs-setup-gpg-agent, then follow the instructions again.

Grenville

comment:10 Changed 4 months ago by cbellisario

Thank you for your answer.

I removed both mosrs-cache-password / mosrs-setup-gpg-agent, took them back from https://code.metoffice.gov.uk/trac/home/wiki/AuthenticationCaching/GpgAgent, scp them to xcslc1 and tried to run it but without success:

chrbe@xcslc1:~> . mosrs-cache-password
Met Office Science Repository Service password:
gpg-preset-passphrase: problem with the agent
gpg-preset-passphrase: caching passphrase failed: Invalid response
gpg-preset-passphrase: problem with the agent
gpg-preset-passphrase: caching passphrase failed: Invalid response
svn: E215004: Authentication failed and interactive prompting is disabled; see                                                       the --force-interactive option
svn: E215004: Unable to connect to a repository at URL 'https://code.metoffice.                                                      gov.uk/svn/test'
svn: E215004: No more credentials or we tried too many times.
Authentication failed
Error: Unable to access Subversion with given password
basename: invalid option -- 'b'
Try `basename --help' for more information.
Run "" to try caching your password again
Connection to xcslc1 closed.
[chrbe@exvmslander:~]$

comment:11 Changed 4 months ago by grenville

Christophe

Those aren't the instructions.

These are the instructions - ​https://collab.metoffice.gov.uk/twiki/bin/view/Support/RetirementOfRoseCylcVMs and associated ​https://code.metoffice.gov.uk/trac/home/wiki/AuthenticationCaching#Monsoon.

Grenville

comment:12 Changed 4 months ago by cbellisario

Yes, I followed these instructions:

So when I try to connect to xcslc1, I have:

[chrbe@exvmslander:~]$ ssh -Y xcslc1
Last login: Thu Feb  7 17:22:04 2019 from 10.168.5.6

    This computer is provided for the processing of Official Information.
    Unauthorized access may constitute a criminal offence. All activity
    on the system is liable to monitoring.


-bash: mosrs-cache-password: command not found
Met Office Science Repository Service password:
gpg-preset-passphrase: problem with the agent
gpg-preset-passphrase: caching passphrase failed: Invalid response
gpg-preset-passphrase: problem with the agent
gpg-preset-passphrase: caching passphrase failed: Invalid response
svn: E215004: Authentication failed and interactive prompting is disabled; see the --force-interactive option
svn: E215004: Unable to connect to a repository at URL 'https://code.metoffice.gov.uk/svn/test'
svn: E215004: No more credentials or we tried too many times.
Authentication failed
Error: Unable to access Subversion with given password
Run "mosrs-cache-password" to try caching your password again
Met Office Science Repository Service password:

comment:13 Changed 3 months ago by grenville

Christophe

Please delete /home/d04/chrbe/mosrs-cache-password and /home/d04/chrbe/mosrs-setup-gpg-agent (again) then follow instructions - ​​https://collab.metoffice.gov.uk/twiki/bin/view/Support/RetirementOfRoseCylcVMs and associated ​​https://code.metoffice.gov.uk/trac/home/wiki/AuthenticationCaching#Monsoon.

Don't follow ​https://code.metoffice.gov.uk/trac/home/wiki/AuthenticationCaching/GpgAgent — the instructions do not direct you to follow this link.

Grenville

comment:14 Changed 3 months ago by cbellisario

Dear Grenville,

Thank you! It does work, my error was to retrieve mosrs-cache-password/mosrs-setup-gpg-agent on the /home/d04/chrbe/ after deleting them. I still get the message

-bash: mosrs-setup-gpg-agent: No such file or directory

but it does not impact the following steps.

When opening Rose, I know get the display of the run, in comparison to what was happening on exvmsrose, so I guess that problem is solved too.

I am now back to my segmentation fault problem that I relate to the STASHmaster modifications. But at least I can work on the code now.

Thank you for your help!

Best regards,

Christophe

comment:15 Changed 3 months ago by grenville

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.