Opened 7 months ago

Last modified 2 months ago

#1990 new help

Bit reproducability

Reported by: simon.tett Owned by: um_support
Priority: normal Component: UM Model
Keywords: Cc:
Platform: ARCHER UM Version: 8.5

Description

Hi,

is N96 HadGEM3-GA6 (vn8.5) bit reproducible? So far, for me, it has been but having lost some data I am trying to rerun. And it doesn't look it bit reproduces when it starts from a dump without reconfiguration.

Should I reconfigure???

Simon

Change History (14)

comment:1 Changed 7 months ago by grenville

Simon

Not quite sure what your workflow is or was. It sounds like you didn't reconfigure previously?

I see no reason why the model wouldn't behave as before (so long as you have not rebuilt in the interim, nor changed the decomposition), so if you didn't reconfigure previously there should be no need now.

We are investigating some recent odd behaviour in other jobs (not specifically in connection with bit reproducibility, 'though)

Grenville

comment:2 Changed 7 months ago by simon.tett

What I did was run through from 01/09/1915 to 01/12/1955 with each run being 2 years. Dumping (I think) once a month and saving dumps once a year. I've lost a bit of data (my fault in how I set up the netcdf conversion using David's code). So I want to regenerate the data. I thought I could do this by starting from one of the dumps and running forward.

So no reconfiguration of the dump, same binary, different start time, different target time. To test this I run for 10 days but see differences in the output diagnostics in the leave file between the original run and the new run — comparing the same times.

Should I expect this case to bit compare??

Simon

comment:3 Changed 7 months ago by grenville

Simon

I think this won't bit compare - the solver might be configured to use the previous solution as a first guess for the next solution, but in your case, there is no previous solution, so things may have diverged at the first time step.

Grenville

comment:4 Changed 7 months ago by simon.tett

That's a pain! In future how should I set up runs so that they do bit-compare (in the sense of restart from dump some way through run).

Simon

comment:5 Changed 7 months ago by grenville

Please let me know the job id

comment:6 Changed 7 months ago by simon.tett

Hi Grenville, is is xmvp#a — that has been modified to start from dump some way through run.
Simon

comment:7 Changed 7 months ago by grenville

Simon

In section 10: dynamical adjustment…

"use last soln as initial guess…" is checked - I believe this is the problem. The advice is to not have this checked. (I have not tested this case)

Grenville

comment:8 Changed 7 months ago by simon.tett

Hi Grenville,

thanks. I will try a run with that. Note that the standard hadgem3-GA6 configuration (xkcfg) has "use last soln as initial guess…". Should that be so?

Simon

comment:9 Changed 6 months ago by grenville

Simon

To bit compare the way you require, then no. I'd like to run a few tests to check this out.

Grenville

comment:10 Changed 6 months ago by simon.tett

Hi Grenville,

I'm note sure I was very clear… I think GA6 (xkxfg) will not bit compare the one I way I want. Is that an error — i.e. does the Met Office variant bit compare the way I want>

Simon

comment:11 Changed 6 months ago by grenville

Simon

That's what I'm planning to test for — it appears that xkcfg won't but compare the way you want.

Grenville

comment:12 Changed 6 months ago by grenville

Simon

"use last soln as initial guess…" — with this checked, the model will fail to bit reproduce (when started from a mid run dump) because the dump doesn't know about the previous solution. However, there are other reasons why it may not bit compare; starting an NRUN from one of your saved dumps means following a different path through the code compared with a CRUN (from the same dump), since there are tests on time-step number in the code. It might help if the history files were available, so you could fully recreate the CRUN — I'm assuming you did a CRUN.

Grenville

comment:13 Changed 5 months ago by simon.tett

Hi Grenville,

sorry — I got embroiled in proposals and teaching so was unable to follow this up. I think all the runs I did were CRUNS after the initial 10 day run. I would expect a new run from a dump to behave the same as a CRUN as long as reconfig doesn't change the state…

Simon

comment:14 Changed 2 months ago by grenville

Simon

Sorry this slipped through the net - I don't think that's guaranteed because the path through the CRUN initialisation differs from that for an NRUN. I did talk to MO experts about this and they were not surprise that the two runs did not bit compare.

Grenville

Note: See TracTickets for help on using tickets.