Opened 8 years ago

Closed 8 years ago

#789 closed help (fixed)

COEX: Unable to WGDOS pack to this accuracy

Reported by: demory Owned by: willie
Component: UM Model Keywords: NaN cumf
Cc: m.e.demory@… Platform:
UM Version: 7.8

Description

Good afternoon,

I would like to run HadGEM3H-A N512 globally (job xgyla). The model is able to run 143 timesteps (almost 1 day), but then crashes with exactly the same error as above, probably while writing an output file: COEX: Unable to WGDOS pack to this accuracy.

I checked my disk space, which is fine. I also added the over ride file written above, but the same error occurs. Did this fix the problem with the LAM above? Do you have any other idea of what might cause this failure?

Best regards, Marie-Estelle

Change History (5)

comment:1 Changed 8 years ago by willie

  • Owner changed from um_support to willie
  • Status changed from new to accepted

Hi Marie-Estelle,

I have created a new ticket for this.

Your job becomes unstable at time step 17:

GCR( 2 ) failed to converge in 200 iterations.

This generally results in NaNs?. Since you have packing switched on, it is trying to compress a gigantic range (NaN) and this causes the WGDOS error message. The thing to do is to find out why it is becoming unstable.

Regards,

Willie

comment:2 Changed 8 years ago by demory

Hi Willie,

Thanks for replying so quickly. This model is very unstable indeed, but it does seem to recover after failing to converge at time step 17. The NaNs? values were then only for 1 time step, weren't they?

I was thinking that maybe something else was happening with my simulation because Pier Luigi did run an N512 simulation (exactly the same model, just run for another year). The model was very unstable at the beginning of the simulation as well, during several time steps (see: /home/n02/n02/vidale/umui_out/xgtwf000.xgtwf.d12040.t110301.leave). However the output files were written properly.

Best regards,
Marie-Estelle

comment:3 Changed 8 years ago by willie

  • Keywords NaN cumf added

Hi Marie-Estelle,

It appears that your start dump is corrupt: if you look at the fields "mean water table depth" and "saturation fraction in deep layer" they have enormous numbers ~ 10300 in them. I found this by using 'cumf' and comparing the start dump with itself.

NaN's have a tendency to breed: when a NaN is added, subtracted, multiplied by an ordinary number a NaN results.

Regards,

Willie

comment:4 Changed 8 years ago by demory

Hi Willie,

So actually these fields in the start dump are fine. They get these values because of the way they are land-packed, but they are fine in the model. I think this is fixed in UM8.0, so that they do not have strange values when you look at them in the dump.
However we discovered that the reconfiguration has a problem with cloud amounts when you reconfigure dumps between resolutions: the amounts could go above 1 (again I think this is fixed at 8.0). So this is why the model was crashing. We reset that to 1, and the model is now going.

Thanks for looking!
Best regards,
Marie-Estelle

comment:5 Changed 8 years ago by willie

  • Resolution set to fixed
  • Status changed from accepted to closed
Note: See TracTickets for help on using tickets.