Convergence failure on first time step

I'm having the same problem with two suites that I am currently trying to run (u-aw620 and u-ai781), both of which successfully ran recently (April/May?). I've been away for a few weeks, but I'm fairly certain I haven't changed anything since then. In both runs, I get a convergence failure at the first timestep:

???!!!???!!!???!!!???!!!???!!! ERROR ???!!!???!!!???!!!???!!!???!!!
? Error code: 11
? Error from routine: EG_BICGSTAB
? Error message: Convergence failure in BiCGstab, omg is too small
? Error from processor: 0
? Error number: 23

I haven't changed the time step, and the cycle dates are the same as before when the suites both ran successfully. The only difference between the two versions of u-ai781 is that I am now using a prescribed cloud droplet number concentration in the cloud scheme (the default UM setting), rather than the climatological aerosol-generated droplet number that was previously being used. I haven't changed suite u-aw620.

Do you have any idea what is going on?


Please point me to log.out files for the successful and failing runs


job.out files that should say

Hi Grenville,

All the forecasts in this folder definitely ran: ~/cylc-run/u-ai781/log.20180406T152019Z/job/20110118T0000Z/

e.g. Peninsula_km1p5_PC2_tnuc_um_fcst_003/NN

But the most recent ones e.g. ~/cylc-run/u-ai781/log.20180723T094345Z/job/20110117T0000Z/Peninsula_km4p0_Smith_tnuc_prog_aer_um_fcst_000/NN



(the most recent run has failed on the recon step with a 'PP HEADERS DO NOT MATCH' error, much to my bemusement, but the last one log contains the error this ticket is about.)

Hi Grenville, any chance you've been able to look at this? I'm getting the same error with suite u-aw620 using completely different dates, too.

Hi Ella,

I'll just focus on u-ai781 here. It is failing in Peninsula_km4p0_Smith_tnuc_presc_aer_um_fcst_000 at the very first cycle time, 20110117T0000Z, with the error you report above. It fails at the first time step, which generally, but not always indicates a problem with the start dump or the ancillary files.

Only a few files have been changed. fcm status shows

?       app/um/STASHexport.ini
M       app/um/rose-app.conf
?       install_glm_startdata[-PT12H]
M       rose-suite.conf
M       site/monsoon-cray-xc40/suite-adds.rc

The ones with the question mark are spurious files which are not used in the suite. The ones with the M are the modified ones.

If we assume the revision 78735 (23rd May 2018) was working, then the main differences can be found using the command

fcm diff -g

This I think is more than the cloud drop number scheme.

I hope that helps.


PS It is a good idea to commit your changes, once you have things working.

