Opened 2 months ago

Last modified 2 months ago

#2879 new help

"Negative mass in set_thermodynamic" error

Reported by: charlie Owned by: um_support
Component: UM Model Keywords:
Cc: Platform: NEXCS
UM Version: 10.7

Description

Hi,

Sorry to bother you, but I have an error in one of my runs (2 of them actually, u-bh301 and u-bh604, which are identical apart from speed/number of processes). The error is as follows:

????????????????????????????????????????????????????????????????????????????????
???!!!???!!!???!!!???!!!???!!! ERROR ???!!!???!!!???!!!???!!!???!!!
? Error code: 100
? Error from routine: set_thermodynamic
? Error message: A total of 1 points had negative mass in set_thermodynamic. This indicates the pressure fields are inconsistent between different levels and the model is about to fail.
? Error from processor: 263
? Error number: 13
????????????????????????????????????????????????????????????????????????????????

I have found this error on the website of known failure points, and have tried following the first instruction i.e. just to retrigger the coupled, in case it was a hardware problem. Unfortunately, however, this appears optimistic, because the same error occurred again, roughly 3.5 hours through the task.

Please can you advise what this means? The instructions on the website say that if it fails again, to try restarting from an earlier restart dump in case one of them has become corrupted. I haven't done this yet, but surely if this was the problem, I would expect the failure to happen on the first timestep of the re-run, not 3.5 hours into it.

Many thanks,

Charlie

Change History (1)

comment:1 Changed 2 months ago by charlie

Hi again,

Further to this, I tried running my suite again (which is what https://code.metoffice.gov.uk/trac/um/wiki/KnownUMFailurePoints says to do), restarting the year before the failure occurred, and this time it successfully ran past the problem year. It ran for a further three years (getting to year 11, i.e. 1860) before failing again, giving me the same error as above. So I restarted again, again from the year before i.e. 1859, but it failed again roughly 3.5 hours in.

The website above implies this might be a hardware failure, but the fact that this keeps happening doesn't make me optimistic. Plus it is happening in different locations: the above is giving an error at processor 263, whereas my latest is at 305. Please can you help?

Charlie

Note: See TracTickets for help on using tickets.