Opened 2 years ago

Closed 21 months ago

#2668 closed help (answered)

NaNs in error

Reported by: amenon Owned by: um_support
Component: UM Model Keywords: NaNs in error
Cc: Platform: ARCHER
UM Version: 10.9



My suite u-bc643 in Archer is failing in the LAM forecast job with the error: NaNs? in error term in BiCGstab

The suite succeeded for two cycles and this error is appearing at the first time step of the third cycle. job.out file shows that NaNs? are present in slow physics and fast physics.

I tried reducing the time step several times based on previous tickets and my own past experience in handling this error. But this time, reducing the times step is not helping to get past this error. I also tried increasing the no. of processors. None of these worked. Please let me know what other options I should try.


Change History (10)

comment:1 Changed 2 years ago by grenville


Can you try extending the length of the cycle which ran successfully - if that is easy tp do.


comment:2 Changed 2 years ago by amenon


I couldn't get over this error last time for the 4 km suite u-bc643; but then made a copy of this suite that got over this error.

Currently, I have a 1.5 km suite in Archer that fails with the same error at the LAM forecast job of the first cycle. This time copying the suite into a new one and starting it fresh didn't solve the issue. The new suite is also stuck at the LAM forecast job of the first cycle with the same error:

Error from routine: EG_BICGSTAB
?  Error message: NaNs in error term in BiCGstab after      1 iterations 

Could you please have a look into this? I think the error appears at the fast physics. Suite id is u-bd605


comment:3 Changed 23 months ago by grenville

Hi Arathy

I am working on this at a snail's pace - I'm assuming you haven't fixed it meanwhile


comment:4 Changed 23 months ago by amenon

Hi Grenville,

I haven't fixed this issue yet. Thanks for looking into this.


comment:5 Changed 23 months ago by grenville


I can no longer reproduce the error - the model is running OK for me. See /work/n02/n02/grenvill/cylc-run/u-bc643/work/20160701T0000Z/INCOMPASS_km4p4_RA1T_um_fcst_000/pe_output for example


comment:6 Changed 23 months ago by amenon

Thanks Greville. I will try to run it then. As you were looking into this suite, I made a copy of this and tried to run it without changing anything. That suite also failed with the same error at the same job. I will give it a try with u-bc643 now.


comment:7 Changed 23 months ago by amenon

Oh sorry Grenville. Its not u-bc643. With u-bc643 which is a 4-km suite, I could get past this error by making a copy of that suite and running it new. The current suite is u-bd605, which is a 1.5-km suite that couldn't succeed at all. I guess these instability errors are difficult to overcome as the resolution increases.

comment:8 Changed 23 months ago by grenville


I got this for u-bd605 in glm_um_fcst_000

???!!!???!!!???!!!???!!!???!!! ERROR ???!!!???!!!???!!!???!!!???!!!
? Error code: 4324
? Error from routine: CHECK_IOSTAT
? Error message:
? Error reading namelist NLSTCGEN
? IoMsg?: The variable name 'SP_SW_GA3_0',' is unrecognized in namelist input.
? Please check input list against code.
? Error from processor: 0
? Error number: 0

but your global run succeeded so I am not sure how to proceed


comment:9 Changed 23 months ago by amenon

Hi Grenville,

I also encountered this error once, but restarting the suite helped to get past this error. I have another copy of the same suite which is stuck with NaNs? in error in the LAM forecast. I made this copy to see if running the new copy will help me overcome the NaNs? in error, but failed. That suite id is u-bd731. You could work with that if it helps.


comment:10 Changed 21 months ago by grenville

  • Resolution set to answered
  • Status changed from new to closed


I believe other tickets address the same problem - we'll close this now.


Note: See TracTickets for help on using tickets.