#1719 closed help (fixed)
Error with CRUN, NEMO restart dump no consistent with the UM .xhist file
Reported by: | dilshadshawki | Owned by: | annette |
---|---|---|---|
Component: | UM Model | Keywords: | NEMO, restart dump, CRUN, |
Cc: | Platform: | MONSooN | |
UM Version: | 8.2 |
Description
Hello Helpdesk,
I have been trying to run a job xlzcg and I have managed to run an NRUN with the output being created successfully. But then when I attempt to run a CRUN I get the following message in the .leave file:
/home/dshawk/otuput/xlzcg000.xlzcg.d15308.t145403.leave
ERROR: The latest NEMO restart dump does not seem to be consistent with the UM .xhist file This suggests an untidy model failure but you may be able to retrieve the run by copying the backup dump and xhist files to original location and restarting with the appropriate NEMO dump
There is also a .comp.leave which shouldn't have been produced since with the CRUN I am not compiling anymore?
/home/dshawk/output/xlzcg000.xlzcg.d15308.t145403.comp.leave
Earlier, I tried to run this job using the restart dumps produced by the run in the projects folder: /projects/ukca-imp/dshawk/xlzcg
However it produces ice restart files which begin from 02 (February) even though the ice restart dump I used begins in 09 (September). This may have something to do with needing to reset using modify_CICE_header but this does not exist anymore since moving to the new Cray system and instead use the following fix as instructed from ticket #1667
UM: fcm:um-br/jwalton/vn8.2_NEMOCICE_restart_fixes_UKMO CICE: fcm:cice-br/dev/jwalton/vn4.1m1_restart_date_fix_UKMO
Thought this may be a side issue. In any case I tried to rerun with the restart dumps produced by the run:
/home/dshawk/startdumps/xlzcg/
xlzcga.da21011201_00
xlzcgo_21011201_restart_0000.nc
anqdhi.restart.1999-12-01-00000
I use a different a CICE restart dump (not produced by the run) in order to maintain the same month (12) but then this gave me a different error, so I thought I better not mess things up too much and just try to rerun everything from scratch with new start dumps:
/home/dshawk/startdumps/xlzcg/
xkhqaa.da23000901_00
xkhqao_23000901_restart.nc
xkhqai.restart.2300-09-01-00000
but once again I have returned to the same problem in the .leave shown above.
Your help would be very much appreciated.
Cheers,
Dill
Change History (14)
comment:1 Changed 5 years ago by annette
- Owner changed from um_support to annette
- Status changed from new to assigned
comment:2 Changed 5 years ago by dilshadshawki
Hi Annette,
Please allow me to clarify the issue:
First of all, it is very strange that the xlczg folder didn't exist!
What I did was run the job, it does the NRUN fine, and that's how I know the branches to replace the modify_CICE_header don't cause any initial problems. But the model outputs CICE startdumps from January (as if the original restartdump was from December) but it should be outputting startdumps from October (since the original restartdump was actually from September).
I went on to do a CRUN and I get the error:
ERROR: The latest NEMO restart dump does not seem to be consistent with the UM .xhist file This suggests an untidy model failure but you may be able to retrieve the run by copying the backup dump and xhist files to original location and restarting with the appropriate NEMO dump
And I think that this error is caused by the CICE startdump since it is in the wrong season.
Now, there is a folder xlzcg here:
/projects/ukca-imp/dshawk/xlzcg
You should now be able to see the output and namelists etc.
Hope this clarifies things!
Cheers,
Dill
comment:3 Changed 5 years ago by annette
Hi Dill,
Thanks for clarifying. I noticed that you hadn't included the UM branch in your job:
fcm:um-br/dev/jwalton/vn8.2_NEMOCICE_restart_fixes_UKMO
There was a typo in the name of this branch in #1667 which I have corrected.
However, I have tested this in a copy of your job and CICE still seems to have the wrong date. I will look into this, but in the meantime I have re-built modify_CICE_header for XCM:
~aospre/bin/modify_CICE_header
This version asks for the value of oceanmixed_ice. Try specifying T first, and if it tries to read past the end of the file, then set to F.
Also, I noticed that the archiving scripts for NEMO/CICE you have included in the job are failing. Do you want some help with that? I have versions for the vn7.3 coupled model which work but they might be different in some way.
Annette
comment:4 Changed 5 years ago by dilshadshawki
Hi Annette,
Thanks for looking into this, please let me know when you find out more.
These are the things I've tried:
1.I ran the job with the CICE reset using the modify_CICE_header, but my branch remained as
fcm:/um/branches/dev/jwalton/vn8.2_NEMOCICE_restart_fixes_UKMO
and not this:
fcm:um-br/dev/jwalton/vn8.2_NEMOCICE_restart_fixes_UKMO
And the CICE files outputted by the job still do not have the correct dates.
- I then tried changing the branch to fcm:um-br/dev… (as in your one above) but I got an error during the compilation mentioning something like:
[FAIL] /projects/ukca-imp/dshawk/xlzcg/baserepos/JULES: cannot locate config file, abort at /work/home/fcm/fcm-2015.05.0/bin/../lib/FCM1/ConfigSystem.pm line 539
But after changing the CICE restart dump from the one I reset using to modify_CICE_header to the one I did not reset then this error goes away, but the same thing happens (as you have found) that the CICE dates are still incorrect, although the branch was correct.
I will wait to hear back from you on this issue?
As for the NEMO and CICE archiving scripts, are you referring to the ones specified in
Atmosphere → Control → Postprocessing, Dumping and Meaning - User Script Release?
and the files are:
/home/dshawk/hadgem3_scripts
In any case, yes please could you help with me that?
Many thanks,
Dill
comment:5 Changed 5 years ago by dilshadshawki
Hi Annette,
Any news regarding the CICE issue and the others above?
Cheers,
Dill
comment:6 Changed 5 years ago by annette
Dill,
I will work on fixing the post-processing scripts this afternoon.
Sorry again I'm slightly confused about what does and doesn't work for you… If you run without the new branches but using the CICE restart edited with modify_CICE_header does that work? This works for me. This is the solution I'd recommend for now.
Annette
comment:7 Changed 5 years ago by dilshadshawki
Hi Annette,
You are correct, not including the new branches and using the CICE restart edited with modify_CICE_header does work for me! Now the outputted CICE files and dumps have the correct dates.
But then I still get the original error as before, which I thought might have been because of the CICE issue. There must be something else going on.
Do you know what the reason behind this error could be?:
ERROR: The latest NEMO restart dump does not seem to be consistent with the UM .xhist file This suggests an untidy model failure but you may be able to retrieve the run by copying the backup dump and xhist files to original location and restarting with the appropriate NEMO dump
ERROR: Expected NEMO output files are not all available. This may be a UM / OASIS / NEMO start-up problem. The ocean.output file may provide more information.
/home/dshawk/output/xlzcg000.xlzcg.d15321.t161520.leave
Best wishes,
Dill
comment:8 Changed 5 years ago by annette
Hi Dill,
This is an issue with the scripts not reading the .xhist file properly. Please include the following branch in the "FCM Options for UM…" window, under "Use Central Script Modifications":
fcm:um_br/dev/annette/vn8.2_NEMOCICE_restart_fixes_XC40_ukmo/src
Annette
comment:9 Changed 5 years ago by annette
Btw, I think you should be able to just build the scripts with the CRUN on without re-doing the NRUN…
Annette
comment:10 Changed 5 years ago by annette
Hi Dill,
As for the post-processing scripts, there are Cray versions available here (note I haven't tested these):
/projects/ocean/hadgem3/scripts/GC2.0_XC40
So either you can use these ones, or just change the executables at the top of your cice_mean.sh and nemo_mean.sh files to use:
nrebuild=/projects/ocean/nemo/utils/bin/rebuild_nemo ncavg=/projects/ocean/nemo/nco/nco-default/bin/ncra mean_trnd3d=/projects/ocean/hadgem3/scripts/GC2.0_XC40/mean_nemo.exe
Hope this helps,
Annette
comment:11 Changed 5 years ago by dilshadshawki
Hi Annette,
You have been a star and my hero. I managed to run the job for a year, finally!
So all is fixed, I hope! Let's close this ticket shall we?
Many thanks again for all of your help!
Best,
Dill
comment:12 Changed 5 years ago by annette
- Resolution set to fixed
- Status changed from assigned to closed
Brilliant! Thanks for letting us know.
Annette
comment:13 Changed 5 years ago by dilshadshawki
Hi Annette,
For some reason, the modify_CICE_header tool gives me the error message 'Illegal instruction' and this happens after I enter the new value for time. It has been working fine up until now, but for some reason it no longer works Do you know what could be going on? See below the procedure I have always taken but now with the error message:
[dshawk@exppostproc01:~/startdumps/xlzcj]$ /home/aospre/bin/modify_CICE_header Enter input file name: xlzcji.restart.2321-09-01-00000 Enter output file name: xlzcji.restart.2321-09-01-00000_reset Please enter the name of the grid used in this file choose from ORCA2, ORCA1, ORCA025, CUSTOM (note case) ORCA1 Enter the value of oceanmixed_ice 'T' or 'F' T oceanmixed_ice = .true. Time header variables: istep0: 183360 time: 518400000. time_forc: 0. Enter new value for istep0 [N for no change]: 0 Enter new value for time [N for no change]: 23328000 Illegal instruction
Many thanks,
Dill
comment:14 Changed 5 years ago by annette
comment:13 moved to new ticket #1823
Annette
Hi Dill,
The directory: /projects/ukca-imp/dshawk/xlzcg does not exist on XCM anymore. Have you moved this elsewhere? As you've found the information in the leave file is pretty minimal so I can't tell what's going on without looking at the job output and namelists. It also looks like you have edited the UMUI job since submitting.
As for the compile-CRUN issue, if you look in the comp.leave file it is only trying to build the UM scripts, which it will allow you to do with a CRUN. To switch off the script build, go to Compile and Run Options → UM Scripts Build and deselect "Enable build of UM scripts".
As far as I know, the branches I mentioned in #1667 should work. I haven't tested them myself but they are being used in other coupled jobs. If you are having trouble with this, use a separate UMUI job, and point me to the leave files so that I can see exactly what is going on.
Best regards,
Annette