I got a model failure along the lines of:


Atm_Step: Timestep 6149

initial Absolute Norm : 6747.1556435749953
GCR( 2 ) failed to converge in 100 iterations.
Final Absolute Norm : 5.77897678748359178E-2

Atm_Step: Timestep 6150

initial Absolute Norm : 3551.8800091137286
GCR( 2 ) converged in 6 iterations.
Final Absolute Norm : NaN

Experiment is HadGEM3-GA3? = xlwts and output can be found at ~stett2/output/xlwts000.xlwts.d16110.t212431.leave.

Some output is in /work/n02/n02/stett2/um/xlwts

But I had switched arching off but not kept files…

I'm not familiar with the ways in which HadGEM3-GA goes down so the error message may not be especially informative..



comment:1 Changed 3 years ago by grenville


Not much help in the leave file - I don't know why you didn't get ATP output (we'll enquire - I have seen the ATP problem before, but I doubt that would be very helpful in this case).

There's not enough to go on in the output. I can only suggest dumping at time steps 6147, 6148, 6149.


comment:2 Changed 3 years ago by simon.tett

Hi Grenville,

thanks. I've not got much experience with HadGEM3 failures.. Had*AM3* used to fail with a -ve pressure error. There was not much point in working out where this had gone wrong as it usually meant the model had gone unstable and the point of failure was random! So is this convergence failure + NaN how HadGEM3-A fails…


comment:3 Changed 3 years ago by grenville


Yes, this is typical of failure resulting from model instability - we try to find out where (through writing multiple dumps), but reconfiguring the last start dump and/or tweaking the time stepping are often our first approach.


comment:4 Changed 3 years ago by simon.tett

Hi Grenville,

thanks. So when I see this in future I'll just treat as "model crashed".. HadA/CM3 were quite forgiving. I remember that HadGEM-2 tended to fail with an interpolation error in the semi-Lagrangian scheme quite frequently… Is HadGEM3-GA6 better?


comment:5 Changed 3 years ago by ros

