Opened 4 years ago

Closed 4 years ago

#1810 closed help (answered)

GCR( 2 ) failed to converge in 100 iterations

Reported by: s.mangeon Owned by: um_support
Component: UM Model Keywords: UKCA, converge
Cc: mohit.dalvi@… Platform: MONSooN
UM Version: 8.4

Description

Good morning,

I have been trying to get a UKCA run going (with nudging, although a similar issue appeared if I turn nudging off in section 39 of UMUI).

Below is the section of the .leave file that contains the issue.

I also tried changing the timestep to 15min instead of 20 but the problem remains.

Are you aware of a way to fix this?

Atm_Step: Timestep 1 Model time: 1995-12-01 00:15:00

Minimum theta level 1 for timestep 1

This timestep This run

Min theta1 proc position Min theta1 timestep

233.71 163 144.4deg W 66.3deg N 233.71 1

Largest negative delta theta1 at minimum theta1

This timestep = -8.42K. At min for run = -8.42K


Maximum vertical velocity at timestep 1 Max w this run

w_max level proc position run w_max level timestep

0.504E+00 38 82 39.4deg W -20.0deg S 0.504E+00 38 1

Qcf < 0 fixed by PC2 73
Qcl < 0 fixed by PC2 73

i_ukca_first: 13
i_ukca_last: 116
==============================================
initial Absolute Norm : 2394.4921357241706
GCR( 2 ) failed to converge in 100 iterations.
Final Absolute Norm : 0.20116336970054544
==============================================

Change History (16)

comment:1 Changed 4 years ago by willie

Hi Stephane,

What job id is this?

Willie

comment:2 Changed 4 years ago by s.mangeon

Hi Willie,

It is job xmegf, my username on MonSOON is smange

comment:3 Changed 4 years ago by willie

Hi Stephane,

There are a few issues here

  1. The setup_archiving.ed hand edit uses the old IBM llq/llqsubmit commands. These should be updated to the Cray's qstat/qsub commands
  2. You are getting
? Error in routine: UKCA_MAIN
? Error Code:     2
? Error Message: Some item addresses not found in D1 array
  1. GCR failed to converge in the first time step

The failure to converge so soon after starting suggests a problem with the ancillary files - have you modified any of these recently? Changing the time step won't help as it hasn't really started.

The error code 2 may be the root cause of the problem.

Regards

Willie

comment:4 Changed 4 years ago by luke

I forwarded this ticket to Mohit Dalvi, the Code Owner of nudging. He has said:

There seems to be some information missing though. The job Stephane is referring to 
has been stopping due to UKCA coupling items not in STASH !

The job also seems to be using my scripts for archiving (developed when vn8.4 
was initially released), which should no longer be required/ used. I will contact 
him to find out the history of the job..

Information on Cray jobs can be found here:

http://www.ukca.ac.uk/wiki/index.php/MONSooN_IBM_to_Cray_Transition

Thanks,
Luke

comment:5 Changed 4 years ago by mdalvi

Hi Stephane,

The job seems to have some conflicting settings so it would be good to know where it is derived from. (It is better to add cp xxxxx in the description every time a job is copied over).

The diagnostics are missing from D1 because of a conflict between hand-edits.
Remove/ deactivate the hand-edit: ~mdalvi/umui_jobs/hand_edits/vn8.4/config_new_diags.ed as this is superseded by ~ukca/hand_edits/VN8.4/config_new_diags_extra.ed.

Remove any archiving hand-edits and copy the settings on the 'Postprocessing' panel from the RJ4.0 UKCA job.

comment:6 Changed 4 years ago by s.mangeon

Hi Mohit, and Luke,

The job was a copy of the standard release job, but I set it up with Douglas Hamilton and we included a bunch of fcm branches and hand edits relating to the recommended setup at Leeds. Douglas kept track of the changes and will send this to me in the morning.

I wrongly assumed this would be kept track of in a log file. Apologies. Out of curiosity, is this tracked in rose and the later versions of the UM?

I applied the change you have suggested with the hand edit, although now it fails on the second timestep:

Atm_Step: Timestep        2   Model time:   1996-06-01 00:40:00
 Qcf < 0 fixed by PC2  85
 Qcl < 0 fixed by PC2  85
  i_ukca_first:  13
  i_ukca_last:  116
  ==============================================
  initial Absolute Norm :  7405.8418757040372
  GCR( 2 ) failed to converge in  100  iterations. 
  Final Absolute Norm :  32.214261373796369
  ==============================================
Q_POS: unable to conserve in        1 columns
Q_POS: unable to conserve in      239 columns
Q_POS: unable to conserve in      292 columns
NUDGING_MAIN: Entering routine 
 Leaving NUDGING_MAIN
   
 Minimum theta level 1 for timestep  2
                This timestep                         This run
   Min theta1     proc          position            Min theta1 timestep
      220.73       9    61.9deg W     -81.3deg S       220.57     1
  Largest negative delta theta1 at minimum theta1 
 This timestep =    -7.65K. At min for run =    -7.77K
   
  Maximum vertical velocity at timestep  2       Max w this run 
    w_max   level  proc         position             run w_max level timestep
   0.281E+03   2    185  178.1deg E     90.0deg N    0.281E+03    2     2

Given the location, could this come from an issue with sea-ice?

To go back to your comment Willie, I am using a different ancillary for sea-ice that Mohitt suggested to Matt Kasoar a while back to run with nudging (and a gregorian calendar):

/projects/ukca-admin/inputs/ancil/surf/sice_mon_1981-2012.n96

comment:7 Changed 4 years ago by willie

Hi Stephane,

There are no nasty values in your sea ice ancil, but it does look very odd. Compare it with

/projects/um1/ancil/atmos/n96/seaice/hadisst_6190/v1/qrclim.seaice

Yours doesn't seem to have any land.

Regards

Willie

Last edited 4 years ago by willie (previous) (diff)

comment:8 Changed 4 years ago by s.mangeon

Hi Willie,

Things are looking better;

So I looked at the ancillary you provided and I also noticed it included sea-ice depth. In the UMUI in :

Model Selection → Atmosphere → Ancillary and input data files → Climatologies & potential climatologies → Sea ice fields

I had Sea ice thickness ancillary field to be "updated", and "using AMIP-II method of updating SST and sea ice" on. Could this explain the issue?

I have restarted the run changing the ancillaries for sea-ice and Sea surface temperature to

/projects/um1/ancil/atmos/n96/seaice/hadisst_6190/v1/qrclim.seaice
/projects/um1/ancil/atmos/n96/sst/hadisst_6190/v1/qrclim.sst

I have also changed the Time update to Monthly in:
Model Selection → Atmosphere → Ancillary and input data files → Climatologies & potential climatologies → Sea ice fields AND Sea surface temperatures

While it does work for now, I fear as I am using nudging it might crash at some point because of the gregorian calendar.

Mohit, am I correct in assuming this might happen and a new ancillary might need to be created, or is the fact that this ancillary is already in a monthly climatology format a solution to the nudging problem?

Thanks for the help with this,
Stéphane

comment:9 Changed 4 years ago by mdalvi

  • Cc mohit.dalvi@… added

The sice_mon_1981-2012.n96 is a monthly version of the daily file used in standard configurations: /projects/um1/ancil/atmos/n96/orca1/seaice/reynolds/1981_2009_360/v0/qrclim.seaice , which does not have land either, so I do not think that is the issue.

The only known issue with Gregorian runs is when using a daily 360-day sst/sice ancil, since the model will drift ahead five days for every year of simulation.
Hence, the files you are using now should be fine since they are monthly (but averaged over 1961-90, so may differ from the period you are running).

I also do not think it is advisable to change the settings on the SST/SICE panels. The updated signifies that seaice data is updated regularly, instead of using the same sst/sice values irrespective of season, throughout the simulation.

comment:10 Changed 4 years ago by s.mangeon

Hi Mohit,

My comment was more related to the fact sice_mon_1981-2012.n96 does not contain an ice thickness field, while the ancillary Willie highlighted does.

To clarify would you recommend sticking to updated daily? (would that work with monthly ancillaries?)

comment:11 Changed 4 years ago by mdalvi

Hi Stephane,

I am not sure what the current status of your job is, but after a similar issue with another vn8.4 run, I have now copied over 365-day Sst/Sice? ancillaries which were used in the CCMI simulations by Steven Hardiman. I believe these were generated using actual 365-day data (and not averaged from daily to monthly and ENDGame to ND, as those above).

The ancils are: /projects/ukca-admin/inputs/ancil/surf/365d/n96hadisst(ice)19492012.anc.

These contain monthly data but keep the update frequency at daily interval and let me know if this works.

comment:12 follow-up: Changed 4 years ago by s.varma13

Hi Mohit

I have taken over this run from Stephane for the time being. I copied Stephane's run and it is now called xmnca. I then changed the ancils as suggested in your previous message but the run is failing on compilation. The .leave file associated with this is [xcml00]/home/suvar/output/xmnca000.xmnca.d16068.t142106.leave.

Unfortunately I cannot see from the .leave file what the problem is. Could you please help?

Many thanks

Sunil

comment:13 in reply to: ↑ 12 Changed 4 years ago by mdalvi

Sunil,

The model has failed in a routine called interpolate.f90, which is again linked to the dynamics/ tracer transport.
There are a lot of messages in pe_output/xmnca.fort6.peXX related to "IMODE,L,no. boxes too large=" and "RADAER_BAND_AVERAGE: unphysical values", so something could have gone wrong in GLOMAP quite some time before the crash.

It might be better to switch Off feedback from UKCA (: UMUI: UKCA—>COUPL —> switch off RADAER, Aerosol indirect effect 1 and 2.) to see if the model runs longer and thus show the error is caused by UKCA.

The GLOMAP warnings are new to me, so cannot say what they imply. The configuration is also quite different to what I am aware of (AeroCom? or Release job) at vn8.4, so the issue could be with any of the changed settings/ branches. Starting again from a configuration that has worked and adding each change incrementally might help to identify the cause.

comment:14 Changed 4 years ago by s.varma13

Thanks a lot - I will proceed as suggested and get back to you.

comment:15 Changed 4 years ago by ros

  • Status changed from new to pending

comment:16 Changed 4 years ago by ros

  • Resolution set to answered
  • Status changed from pending to closed

Closing this ticket due to inactivity - please reopen should you need more help with this.

Cheers,
Ros

Note: See TracTickets for help on using tickets.