Opened 4 months ago

Closed 7 weeks ago

#2668 closed help (answered)

NaNs in error

Reported by: amenon Owned by: um_support
Priority: normal Component: UM Model
Keywords: NaNs in error Cc:
Platform: ARCHER UM Version: 10.9

Description

Hi,

My suite u-bc643 in Archer is failing in the LAM forecast job with the error: NaNs? in error term in BiCGstab

The suite succeeded for two cycles and this error is appearing at the first time step of the third cycle. job.out file shows that NaNs? are present in slow physics and fast physics.

I tried reducing the time step several times based on previous tickets and my own past experience in handling this error. But this time, reducing the times step is not helping to get past this error. I also tried increasing the no. of processors. None of these worked. Please let me know what other options I should try.

Cheers,
Arathy

Change History (10)

comment:1 Changed 4 months ago by grenville

Arathy

Can you try extending the length of the cycle which ran successfully - if that is easy tp do.

Grenville

comment:2 Changed 4 months ago by amenon

Hi,

I couldn't get over this error last time for the 4 km suite u-bc643; but then made a copy of this suite that got over this error.

Currently, I have a 1.5 km suite in Archer that fails with the same error at the LAM forecast job of the first cycle. This time copying the suite into a new one and starting it fresh didn't solve the issue. The new suite is also stuck at the LAM forecast job of the first cycle with the same error:

Error from routine: EG_BICGSTAB
?  Error message: NaNs in error term in BiCGstab after      1 iterations 


Could you please have a look into this? I think the error appears at the fast physics. Suite id is u-bd605

Thanks,
Arathy

comment:3 Changed 3 months ago by grenville

Hi Arathy

I am working on this at a snail's pace - I'm assuming you haven't fixed it meanwhile

Grenville

comment:4 Changed 3 months ago by amenon

Hi Grenville,

I haven't fixed this issue yet. Thanks for looking into this.

Arathy

comment:5 Changed 3 months ago by grenville

Arathy

I can no longer reproduce the error - the model is running OK for me. See /work/n02/n02/grenvill/cylc-run/u-bc643/work/20160701T0000Z/INCOMPASS_km4p4_RA1T_um_fcst_000/pe_output for example

Grenville

comment:6 Changed 3 months ago by amenon

Thanks Greville. I will try to run it then. As you were looking into this suite, I made a copy of this and tried to run it without changing anything. That suite also failed with the same error at the same job. I will give it a try with u-bc643 now.

Arathy

comment:7 Changed 3 months ago by amenon

Oh sorry Grenville. Its not u-bc643. With u-bc643 which is a 4-km suite, I could get past this error by making a copy of that suite and running it new. The current suite is u-bd605, which is a 1.5-km suite that couldn't succeed at all. I guess these instability errors are difficult to overcome as the resolution increases.

comment:8 Changed 3 months ago by grenville

Arathy

I got this for u-bd605 in glm_um_fcst_000

????????????????????????????????????????????????????????????????????????????????
???!!!???!!!???!!!???!!!???!!! ERROR ???!!!???!!!???!!!???!!!???!!!
? Error code: 4324
? Error from routine: CHECK_IOSTAT
? Error message:
? Error reading namelist NLSTCGEN
? IoMsg?: The variable name 'SP_SW_GA3_0',' is unrecognized in namelist input.
? Please check input list against code.
? Error from processor: 0
? Error number: 0
????????????????????????????????????????????????????????????????????????????????

but your global run succeeded so I am not sure how to proceed

Grenville

comment:9 Changed 3 months ago by amenon

Hi Grenville,

I also encountered this error once, but restarting the suite helped to get past this error. I have another copy of the same suite which is stuck with NaNs? in error in the LAM forecast. I made this copy to see if running the new copy will help me overcome the NaNs? in error, but failed. That suite id is u-bd731. You could work with that if it helps.

Regards,
Arathy

comment:10 Changed 7 weeks ago by grenville

  • Resolution set to answered
  • Status changed from new to closed

Arathy

I believe other tickets address the same problem - we'll close this now.

Grenville

Note: See TracTickets for help on using tickets.