Opened 3 years ago

Closed 3 years ago

#2409 closed help (fixed)

Model crashing with no obvious error

Reported by: ajd Owned by: um_support
Component: UM Model Keywords:
Cc: Platform: Monsoon2
UM Version: 10.7



A few of my suites have crashed with no obvious error. In most cases, restarting the run by setting up an NRUN seems to work so I'm doing that, but there is at least one suite where that hasn't worked (u-at673) - for that one I had changed the cycling frequency mid-run via a reload so it could be that something went wrong with that but that was >20 model years earlier.

I have tried switching on extra diagnostics and to save all output from all processors (under um→env→Runtime Controls→Atmosphere only) but that doesn't seem to work - I still only get output from pe0. Any ideas why that wouldn't work?

Many thanks,

Change History (7)

comment:1 Changed 3 years ago by ajd

Hello again,

Further to this, the suites I am trying to restart now fail before running the model in the validate_suite_info task:

Exception: Retrieval of controlled vocabulary failed. Please check carefully that the section ("experiment_id") and revision ("3.2.5") are both valid.
command: curl -s -f
return code: 22

I don't really understand why this is now happening (never has before for the same setup) and why I don't see any useful error messages for the failed coupled tasks (before the restarts)?

Have you seen this before?

Many thanks,

comment:2 Changed 3 years ago by grenville

Just set the failed task to succeeded - do you need to keep validating the suite?


comment:3 Changed 3 years ago by ajd

Thanks Grenville - I guess I was just wondering why something that worked before didn't anymore, but setting to succeeded works.

Do you know why the switch to save output from all processors doesn't work though?

Thanks, Andrea

comment:4 Changed 3 years ago by grenville


The suite validation gets data from an external source so relies on network etc and the source itself

I'm still working on pe output


comment:5 Changed 3 years ago by ajd

Makes sense, thanks!

comment:6 Changed 3 years ago by grenville

Hi Andrea

The error is with NEMO (see /home/d02/andit/cylc-run/u-at673/work/19131001T0000Z/coupled/ocean.output).

==⇒>> : E R R O R


stpctl: the zonal velocity is larger than 20 m/s

kt= 22180 max abs(U): 4.6520E+05, i j k: 118 289 10

output of last fields in numwso

==⇒>> : E R R O R


There is scant documentation about why this happens, other than to say it's not sensitive to complier not time step.

The developers of your model may know more - are you in contact with them?


comment:7 Changed 3 years ago by willie

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.