Opened 20 months ago
Closed 20 months ago
#2345 closed help (fixed)
Error in routine: eg_sl_helmholtz
Reported by: | nfreychet | Owned by: | um_support |
---|---|---|---|
Component: | UM Model | Keywords: | |
Cc: | Platform: | ARCHER | |
UM Version: | 8.5 |
Description
Hello,
I am running 2 nudged run on ARCHER (nudged to ERA Interim winds, with the only difference between the two being the aerosol emissions), and it was fine for the first 30 years but then both runs stopped with the same error message:
????????????????????????????????????????????????????????????????????????????????
???!!!???!!!???!!!???!!!???!!!???!!! ERROR ???!!!???!!!???!!!???!!!???!!!???!!!?
? Error in routine: eg_sl_helmholtz
? Error Code: 1
? Error Message: Convergence failure in BiCGstab
? Error generated from processor: 0
? This run generated 54 warnings
????????????????????????????????????????????????????????????????????????????????
I read on other tickets that it could be due to numerical instability. But as both runs crash exactly at the same day, I'm wondering what could create this instability?
Also, a suggestion seems to change the time step of the model, but I don't know where to do that in UMUI?
Cheers,
Nicolas
Change History (9)
comment:1 Changed 20 months ago by nfreychet
comment:2 Changed 20 months ago by grenville
Nicolas
In the UMUI model selection→atmosphere→scientific parameter…→time-stepping.
You could try to diagnose the problem by writing out startdumps close to the point of failure, and/or writing out increments to help to isolate where the instability is developing.
Grenville
comment:3 Changed 20 months ago by nfreychet
Hi Grenville,
Thanks, I will try to do that.
Also, I forgot to say that the runs stop at the very beginning of a new year (2013 01 01), so I suspect it might also be due to an error or something with the calendar in the forcing data.
Nico
comment:4 Changed 20 months ago by grenville
Nico
I did look at the forcing data - I didn't see anything pathological. I did notice that Jan 2013 appears to have a warm N pole (and Jun 2004 (for example) had a warm south pole.
Grenville
comment:5 Changed 20 months ago by nfreychet
Hi Grenville,
I tried to change the time step of the model but I still have the same problem.
I looked at the last increment for some variable (u,v,theta…) but I didn't notice any unusual variable. (the last dump can be accessed at /work/n02/n02/nfreyche/xnbrd/xnbrda.da20130101_00 )
Also, in the output, just before the time step where it crashes, everything looks fine:
EG_SISL_Resetcon: calculate reference profile ******************************************** * Linear solve for Helmholtz problem * * ==================================== * * Inner iter 1 * * No. Of linear solver iterations 12 * * Initial error 0.100000E+01 * * Final error 0.870522E-03 * * Min exner prime -0.304858E-02 * * Max exner prime 0.181341E-02 * * Inner iter 2 * * No. Of linear solver iterations 5 * * Initial error 0.122262E+00 * * Final error 0.958504E-03 * * Min exner prime -0.413351E-02 * * Max exner prime 0.143016E-02 * ******************************************** ******************************************** * Linear solve for Helmholtz problem * * ==================================== * * Inner iter 1 * * No. Of linear solver iterations 5 * * Initial error 0.762387E+00 * * Final error 0.588338E-03 * * Min exner prime -0.333058E-02 * * Max exner prime 0.990215E-02 * * Inner iter 2 * * No. Of linear solver iterations 2 * * Initial error 0.332354E+00 * * Final error 0.812022E-03 * * Min exner prime -0.320480E-02 * * Max exner prime 0.501388E-02 * ******************************************** Atm_Step_4A: L_USE_CARIOLLE = F Atm_Step_4A: Cariolle scheme not called NUDGING_MAIN: Entering routine Leaving NUDGING_MAIN Minimum theta level 1 for timestep 824041 This timestep This run Min theta1 proc position Min theta1 timestep 230.12 93 82.5deg W 77.5deg N 230.12824041 Largest negative delta theta1 at minimum theta1 This timestep = -8.34K. At min for run = -8.34K Maximum vertical velocity at timestep 824041 Max w this run w_max level proc position run w_max level timestep 0.263E+01 84 89 172.5deg E 75.6deg N 0.263E+01 84824041 ******************************************************************************** Atm_Step: Timestep 824042 Model time: 2013-01-01 00:40:00 EG_SISL_Resetcon: calculate reference profile ******************************************** * Linear solve for Helmholtz problem * * ==================================== * ???????????????????????????????????????????????????????????????????????????????? ???!!!???!!!???!!!???!!!???!!!???!!! ERROR ???!!!???!!!???!!!???!!!???!!!???!!!? ? Error in routine: eg_sl_helmholtz ? Error Code: 1 ? Error Message: Convergence failure in BiCGstab ? Error generated from processor: 0 ? This run generated 53 warnings ????????????????????????????????????????????????????????????????????????????????
Yet as both runs stop at the same time, it must have something to do with either the nudging or forcing.
I will try to kill the nudging for this day and see if there is any change.
Nico
comment:6 Changed 20 months ago by nfreychet
Hi Grenville,
So I tried to turn off the nudging, and also changed the time step to 24. I recompile everything and tried to restart from the last dump, but I still have the exact same error.
comment:7 Changed 20 months ago by grenville
Nico
I think the problem is with Specification of trace gases (model selection→atmosphere→scientific parameters..→Spec of trace gases) for which the input data is only specified to 2013.
Grenville
comment:8 Changed 20 months ago by nfreychet
Hi Grenville,
Indeed I extended the input data after 2013 and it solved the problem. Thanks a lot!
Nico
comment:9 Changed 20 months ago by grenville
- Resolution set to fixed
- Status changed from new to closed
Nico
Great - it's a pity there is no checking for this in the job set up.
Grenville
PS: the runs are xnbre and xnbrd.