Opened 7 years ago

Closed 7 years ago

#888 closed help (fixed)

HadGEM3 UKCA model crash: ACUMPS: Data corruption during I/O

Reported by: ata27 Owned by: um_support
Component: UM Model Keywords: STASH, ACUMPS
Cc: Platform:
UM Version: 7.3

Description

Hi, I am having a rather annoying crash everytime I try to run a job which I have added some new diagnostics to.
The job is a N48 L60 UM-UKCA job on MONSooN (xgywo). I (stupidly) stopped the model run, added in some new diagnostics (which required some code changes) and then went to re-compile and run the model and I now get this error:

*
UM ERROR (Model aborting) :
Routine generating error: U_MODEL
Error code: 4
Error message:

ACUMPS: Data corruption during I/O

I have looked through the archive and seen some pointers for identifying the problem but I have tried changing start dumps and have had no success!

Any pointers in what I should do? The run is based on xgywn, which is running. I recently copied xgywn to perform another experiment (making only UMUI changes) and that experiment (xgywr) is running fine.

Many thanks,

Alex

Change History (9)

comment:1 Changed 7 years ago by willie

  • Keywords STASH, ACUMPS added
  • UM Version changed from <select version> to 7.3

Hi Alex,

Your job runs for 1440 time steps before crashing. The problem is likely to be due to STASH. Try switching on Output Choices > STASH messages. Also, if you go to the STASH panel and verify the diagnostics (Ctl-v), you will see that there are some errors that need to be corrected.

Regards,

Willie

comment:2 Changed 7 years ago by ata27

Hi Willie,

Thanks I will add on the STASH messages now. WRT correcting those errors, well I understand that they should be corrected but the job I based this on "works" OK with those errors. To be honest I don't quite know why those errors are there but they seem common for UKCA jobs that I have run before with no problems too.

Cheers,

Alex

comment:3 Changed 7 years ago by ata27

Hi Willie,

I have a new .leave file on MONSooN. If you would be able to have a look at it and let me know if there is anything that makes sense. I added the UMUI change to include STASH messages.

xgywo000.xgywo.d12221.t133804.leave

Thanks,

Alex

comment:4 Changed 7 years ago by willie

Hi Alex,

If you compare the latest dump, xgywoa.dak0c10, with itself using cumf (it's in $UMDIR/vn7.3/ibm/utils on MONSooN), it complains that the STASH field 34,322 (Ox Budget: NOy etc) is corrupted.

One possible way forward is to resubmit the run using a dump from a few periods back to allow climate meaning to complete. (ACUMPS - Accumulate UM Partial Sums - does the climate meaning - see document C5).

Regards,

Willie

comment:5 Changed 7 years ago by ata27

Hi Willie,

I was very silly and I managed to delete all but the two dumps that I have tried to run with (which both give the same ACCUMPS error).

Thanks for looking. I will try now starting from a non xgywo startdump. Do you think that will work?

Cheers,

Alex

comment:6 Changed 7 years ago by ata27

Hi Willie et al.,

I just tried re-submitting a NRUN using another startdump from a previous run and encountered the same ACCUMPS error. Can you suggest anything else I can try?

Thanks,

Alex

comment:7 Changed 7 years ago by luke

Hi Alex,

You're using a start-dump from a CheST (stratosphere+troposphere chemistry) run to initialise a CheS (stratosphere-only chemistry run) which means that the number of prognostic and diagnostic fields in section 34 will be less, which can cause problems if not managed carefully. I believe this is a general problem with the way UKCA has some prognostics and diagnostics in the same section, and is mostly an issue with the reconfiguration, although you don't seem to have had issues with that here.

You haven't changed the pre-STASHmaster (PSM) files going into the job, so the STASH items reserved in the Tropospheric Ox and CO budgets which will still be hidden in the dump won't be causing the issue. I'm wondering if it could be because of the tropospheric chemical tracers which exist in CheST but not in CheS. Have you tried using a different start-dump to see if this is the issue?

You could take a dump from any atmosphere only run and initialise your chemical tracers to a CheS initial condition (e.g.

/projects/ukca/inputs/ancil/QESM/CheS_init.anc

initialising tracer 4 to

/projects/ukca/inputs/ancil/QESM/CheS_init_NOy.anc

).

If it is, you can remove these fields from the dump by setting the SPACE code in the PSM to 10 for all fields in section 34 that exist in xgywn (prognostic or diagnostic) that you don't want in xgywo, and see if this improves matters.

Thanks,

Luke

comment:8 Changed 7 years ago by ata27

Hi Luke and Willie,

I think I have go to the bottom of this. I had a typo/bug in a ukca module (asad_flux_dat.F90) where I had the incorrect number of reactants/products being passed into the "find reaction" function. I was not asking for this actual flux to be output, but correcting the bug (which was picked up by Paul Telford who found that the HECToR compiler complained about this — odd that the MONSooN one didn't?) seems to solve the problem. I have set of a new run on IBM02 and will let you know if its OK.

Thanks,
Alex

comment:9 Changed 7 years ago by grenville

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.