#2669 closed help (answered)

Duplicate Stashcode

Reported by: amenon Owned by: um_support
Component: UM Model Keywords: duplicate stashcode, READ_NML_ITEMS
Cc: Platform: ARCHER
UM Version: 10.9

Description

Hi,

My suite u-bc555 is failing at the LAM reconfiguration with the following error:

Error from routine: READ_NML_ITEMS
Error message: Duplicate Stashcode entries for Stashcode    5 in items namelists.

From job.out I think that this error comes from the land ancil /work/n02/n02/amenon/suite/km17/qrparm.orog
I have used these ancils for another suite in MONSooN without any issue. Could you please have a look into this? Many thanks.

Change History (18)

comment:1 Changed 11 months ago by willie

Hi Arathy,

Is this STASH something you've added? You can delete the duplicates from the list in the GUI. You need to run the STASH macros after any changes, especially tidySTASH transform.

Regards
Willie

comment:2 Changed 11 months ago by amenon

Hi Willie,

If this error is coming from the ancils, then I haven't added any stashes in that. I have added stashes only to the diagnostic output from the LAM, for which I did tidy stash transform. Is there a place where I could do the same for the ancils?

Arathy

comment:3 Changed 11 months ago by willie

Hi Arathy,

I took a copy of u-bc555 at revision 95373 - i.e. before your changes. It had one STASH error which I corrected and ran. It fails at cycle 20160701 in the INCOMPASS_km17_ga6_um_recon task with the duplicate STASH code error. So this has never worked.

Did the ancestor u-ay368/trunk@94811 work? You should check this out and see if the same error appears.

It is a good idea only to commit working suites to the trunk. Sometimes this is not possible and then you should make a comment indicating the problems in the commit message.

Regards,
Willie

comment:4 Changed 11 months ago by amenon

Hi Willie,

u-ay368 worked for several cycles and then failed with NaNs? in error in LAM forecast. But for all the cycles it succeeded the recon task. u-ay368 was a 4.4km suite for which I have added all the required diagnostic output for this case study. u-bc555 is a 17km suite for the same case study. So I copied u-ay368 and then changed the resolution to 17km, changed science configuration from RA1T to ga6 and am using the 17km land ancillaries instead of the 4 km ones. So these are the changes in this suite (u-bc555) compared to the ancestor u-ay368.

Sorry about the commit. I will put sufficient comments in the commit message in future. I committed this so that Stu could have a look at the setup before I start.

Regards,
Arathy

comment:5 Changed 11 months ago by amenon

Hi Willie,

Just to add, the 17 km suite with the same set up, but with almost half the number of diagnostic outputs, ran for a season (152 days) in MONSooN sometime back. That suite id is u-ar437.

Regards,
Arathy

comment:6 Changed 11 months ago by willie

Hi Arathy,

The GA6 configuration in rose-app-ga6.conf contains

[namelist:items(823c37a9)]
ancilfilename='$UM_ANCIL_OROG_DIR/$UM_ANCIL_OROG_FILE'
domain=1
!!interval=1
!!netcdf_varname=9*'unset'
!!period=1
source=2
stash_req=5,6,17,18,34,35,36,37
update_anc=.false.
!!user_prog_ancil_stash_req=
!!user_prog_rconst=0.0

while the RA1T doesn't. I think this is responsible for the clash. I don't know what the differences between GA6 and RA1T mean, but it looks like there is more to do than just switching between them.

Willie

comment:7 Changed 11 months ago by amenon

Hi Willie,

I checked the rose-app-ga6.conf in the succeeded 17 km suite in Monsoon, that also contains the above mentioned lines. The only change in this suite compared to the 17km suite in Monsoon is additional diagnostics (more than double the number of stashes in this one compared to the Monsoon suite, including some tracers). I am currently trying to rerun the Monsoon suite for a single cycle with the same set up to see if that succeeds.

Regards,
Arathy

comment:8 Changed 11 months ago by amenon

Hi Willie,

I took a copy of my 17 km suite in monsoon and was trying to do a test as mentioned above. That suite id is u-bd046. However, I am not able to submit that suite. I get the following error:

[FAIL] bash -ec H=$(rose\ host-select\ postproc);\ echo\ $H # return-code=1, stderr=
[FAIL] [WARN] postproc: (ssh failed)
[FAIL] [FAIL] No hosts selected.

I found this ticket http://cms.ncas.ac.uk/ticket/2192. However, can't find EXTRACT_HOST in the conf file.

I can submit other suites in Monsoon without any trouble.

Regards,
Arathy

comment:9 Changed 11 months ago by willie

Hi Arathy,
The Monsoon post processor is down at the moment - see the Met office conversation on Yammer.

Willie

Last edited 11 months ago by willie (previous) (diff)

comment:10 Changed 11 months ago by amenon

Thanks Willie.

I am trying different ways to get over this error, but not able to succeed. I found that the value 823c37a9 in rose-app-ga6.conf in the failing suite in Archer (as shown below) is different from the value of the succeeded suite in Monsoon which was ada02aed

[namelist:items(823c37a9)]
ancilfilename='$UM_ANCIL_OROG_DIR/$UM_ANCIL_OROG_FILE'
domain=1
!!interval=1
!!netcdf_varname=9*'unset'
!!period=1
source=2
stash_req=5,6,17,18,34,35,36,37
update_anc=.false.
!!user_prog_ancil_stash_req=
!!user_prog_rconst=0.0

Hence I tried changing this value from 823c37a9 to ada02aed, then it looks like it has passed the previous failing point 'duplicate stash code' that was appearing around Line 4678 of the job.out file (i.e at qrparm.orog land ancil). But, now I have a new error at LAM forecast as:

?  Error code: 2
?  Error from routine: WGDOS Packing (f_shum_wgdos_pack)
?  Error message: Problem packing field...
?        STASH:   593
?        Accuracy: -12
?        Minimum:   -0.5022556018E+05
?        Maximum:    0.6438254249E+06
?        Message:  Unable to WGDOS pack to this accuracy
?  Error from processor: 30
?  Error number: 84

How should I proceed now? Should I reinstate the value of the namelist to 823c37a9 and try some other way to get over this error? Please help.

Regards,
Arathy

comment:11 Changed 11 months ago by willie

Hi Arathy,

You have fallen foul of the Rose/Cylc STASH indexing problem. The STASH is only indexed if it appears in the main tables. But in your suite some STASH is buried in the optional configuration files. This is fine so long as you do not change anything. But if you are changing STASH then the safest way to proceed is to take the STASH out of the optional configuration file and add it to the main tables and re-index using tidy STASH transform. So delete the STASH item in rose-app-ga6.conf and put it in the Configure Ancils and Dump Fields table. You can then properly index it. What you have done is essentially manually re-index, which is risky.

This issue is discussed in https://cylc.github.io/cylc/doc/suite-design-guide.pdf.

The packing issue is a separate error. Sometimes it is caused by having NaNs in the dump or instability of the model.

Willie

comment:12 Changed 11 months ago by amenon

Hi Willie,

I deleted the STASH item from rose-app-ga6.conf and added it in the Configure Ancils and dump table in the GUI. But now I am back to the duplicate stash code error.

Regards,
Arathy

comment:13 Changed 11 months ago by willie

Hi Arathy,

Did you remember to run the tidy STASH transform?

Willie

comment:14 Changed 11 months ago by amenon

Hi Willie,

Yes, I did tidy stash transform. I ran all those macros to be safe.

Arathy

comment:15 Changed 11 months ago by willie

Arathy,

The run in

/home/amenon/cylc-run/u-bc555/log/job/20160701T0000Z/INCOMPASS_km17_ga6_um_recon/06

seems to have worked?

Willie

comment:16 Changed 11 months ago by amenon

Hi Willie,

I just noticed. Reconfiguration job succeeded, but now LAM forecast is the one that's failing with the same duplicate stashcode error.

Arathy

comment:17 Changed 11 months ago by willie

Hi Arathy,

I took a copy of your modified suite (with the change I suggested) and ran it. It fails in exactly the same place with the same error. If you go to Configure ancils and initialise dump fields page and select the orography line you added an right click and select "ignore this section" you can get past this error.

So this field was in GA6 but not in RA1T. We moved it from rose-app-ga6.conf to the main STASH table and got the same error. Then we removed it and the error disappeared.

You will then get another error,

?  Error code: 1
?  Error from routine: RCF_CALC_LEN_ANCIL
?  Error message: Attempted to process ancillary with STASHcode:     7 however
   the prognostic will not be present in the output dump. Either remove ancil
   request or check that science selected in namelists is compatible with this
   ancil field.

STASH 7 is the unfiltered orography.

I think there is a danger here of gradually modifying GA6 until it becomes RA1T. I also not the warning on the Config 1 setup page that only RA1T and RA1M have been tested at 10.9. It may be better to get advice from Stu Webster.

Willie

comment:18 Changed 10 months ago by willie

  • Resolution set to answered
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.