Opened 3 years ago

Closed 3 years ago

#1868 closed help (answered)

Model failure

Reported by: simon.tett Owned by: um_support
Component: UM Model Keywords:
Cc: b.dong@… Platform: ARCHER
UM Version: 7.8

Description

HI,

I got a model failure along the lines of:

==============================================

Atm_Step: Timestep 6149

==============================================
initial Absolute Norm : 6747.1556435749953
GCR( 2 ) failed to converge in 100 iterations.
Final Absolute Norm : 5.77897678748359178E-2
==============================================

Atm_Step: Timestep 6150

==============================================
initial Absolute Norm : 3551.8800091137286
GCR( 2 ) converged in 6 iterations.
Final Absolute Norm : NaN
==============================================

Experiment is HadGEM3-GA3? = xlwts and output can be found at ~stett2/output/xlwts000.xlwts.d16110.t212431.leave.

Some output is in /work/n02/n02/stett2/um/xlwts

But I had switched arching off but not kept files…

I'm not familiar with the ways in which HadGEM3-GA goes down so the error message may not be especially informative..

thanks

Simon

Change History (5)

comment:1 Changed 3 years ago by grenville

Simon

Not much help in the leave file - I don't know why you didn't get ATP output (we'll enquire - I have seen the ATP problem before, but I doubt that would be very helpful in this case).

There's not enough to go on in the output. I can only suggest dumping at time steps 6147, 6148, 6149.

Grenville

comment:2 Changed 3 years ago by simon.tett

Hi Grenville,

thanks. I've not got much experience with HadGEM3 failures.. Had*AM3* used to fail with a -ve pressure error. There was not much point in working out where this had gone wrong as it usually meant the model had gone unstable and the point of failure was random! So is this convergence failure + NaN how HadGEM3-A fails…

Simon

comment:3 Changed 3 years ago by grenville

Simon

Yes, this is typical of failure resulting from model instability - we try to find out where (through writing multiple dumps), but reconfiguring the last start dump and/or tweaking the time stepping are often our first approach.

Grenville

comment:4 Changed 3 years ago by simon.tett

Hi Grenville,

thanks. So when I see this in future I'll just treat as "model crashed".. HadA/CM3 were quite forgiving. I remember that HadGEM-2 tended to fail with an interpolation error in the semi-Lagrangian scheme quite frequently… Is HadGEM3-GA6 better?

Simon

comment:5 Changed 3 years ago by ros

  • Resolution set to answered
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.