Opened 4 weeks ago

Last modified 36 hours ago

#2409 new help

Model crashing with no obvious error

Reported by: ajd Owned by: um_support
Priority: normal Component: UM Model
Keywords: Cc:
Platform: Monsoon2 UM Version: 10.7



A few of my suites have crashed with no obvious error. In most cases, restarting the run by setting up an NRUN seems to work so I'm doing that, but there is at least one suite where that hasn't worked (u-at673) - for that one I had changed the cycling frequency mid-run via a reload so it could be that something went wrong with that but that was >20 model years earlier.

I have tried switching on extra diagnostics and to save all output from all processors (under um→env→Runtime Controls→Atmosphere only) but that doesn't seem to work - I still only get output from pe0. Any ideas why that wouldn't work?

Many thanks,

Change History (6)

comment:1 Changed 4 weeks ago by ajd

Hello again,

Further to this, the suites I am trying to restart now fail before running the model in the validate_suite_info task:

Exception: Retrieval of controlled vocabulary failed. Please check carefully that the section ("experiment_id") and revision ("3.2.5") are both valid.
command: curl -s -f
return code: 22

I don't really understand why this is now happening (never has before for the same setup) and why I don't see any useful error messages for the failed coupled tasks (before the restarts)?

Have you seen this before?

Many thanks,

comment:2 Changed 4 weeks ago by grenville

Just set the failed task to succeeded - do you need to keep validating the suite?


comment:3 Changed 4 weeks ago by ajd

Thanks Grenville - I guess I was just wondering why something that worked before didn't anymore, but setting to succeeded works.

Do you know why the switch to save output from all processors doesn't work though?

Thanks, Andrea

comment:4 Changed 4 weeks ago by grenville


The suite validation gets data from an external source so relies on network etc and the source itself

I'm still working on pe output


comment:5 Changed 4 weeks ago by ajd

Makes sense, thanks!

comment:6 Changed 36 hours ago by grenville

Hi Andrea

The error is with NEMO (see /home/d02/andit/cylc-run/u-at673/work/19131001T0000Z/coupled/ocean.output).

==⇒>> : E R R O R


stpctl: the zonal velocity is larger than 20 m/s

kt= 22180 max abs(U): 4.6520E+05, i j k: 118 289 10

output of last fields in numwso

==⇒>> : E R R O R


There is scant documentation about why this happens, other than to say it's not sensitive to complier not time step.

The developers of your model may know more - are you in contact with them?


Note: See TracTickets for help on using tickets.