Opened 13 months ago
Closed 13 months ago
#3180 closed help
checksum failure in climate mean
Reported by: | ggxmy | Owned by: | um_support |
---|---|---|---|
Component: | UM Model | Keywords: | |
Cc: | Platform: | Monsoon2 | |
UM Version: | 11.1 |
Description
Dear CMS,
atmos_main failed in one of my UKESM-A nudged simulations with an error message;
ERROR: checksum failure in climate mean Section 0 item 2 This can be due to invalid values in field, or corruption of partial sum file Remove or fix diagnostic, and rerun ???!!!???!!!???!!!???!!!???!!! ERROR ???!!!???!!!???!!!???!!!???!!! ? Error code: 4 ? Error from routine: U_MODEL_4A ? Error message: ACUMPS: Diagnostic error. See output for item no. ? Error from processor: 0 ? Error number: 688
"Section 0 item 2" is the u wind but maybe it is simply the first diagnostic and could be anything?
I'm currently running 3 almost identical simulations, all of which started from Mohit's release job u-bm242, and the only difference between 3 suites is the low level anthropogenic SO2 emissions. u-br297 uses the regular emission. u-br357 uses the same emission on land and emission from sea surface is masked and set to zero. This one, u-br356, uses the same emission on land and emission from sea surface is set to 20% of the original.
diff -r u-br297 u-br356
doesn't seem to show any substantial difference between two suites except the emission ancillary files.
All of these suites ran for 16 months without a problem. But soon after I extended the runs only this suite crashed. So to me this is strange.
Anyway I turned off the request to 00-002 and restarted the run. Now it failed with this;
? Error code: 1 ? Error from routine: io:file_open ? Error message: Failed to open file /home/d03/myosh/cylc-run/u-br356/share/data/History_Data/br356a.da20110101_00 ? Error from processor: 0 ? Error number: 22
br356a.da20110101_00 does not exist indeed. I could try other things but I thought maybe its better to seek help before the situation gets more complicated. Please could I have an advice on this?
Thanks,
Masaru
Change History (2)
comment:1 Changed 13 months ago by grenville
comment:2 Changed 13 months ago by ggxmy
- Status changed from new to closed
Thank you Grenville.
as you say br356a.da20110101_00 is not present locally any more but has been archived. Would you suggest to start a new run from 20110101 using this as start dump? Well, instead of being waiting for your reply I tried doing this and its been running overnight. so I'm closing this ticket.
Masaru
Masaru
Cylce 20101201T0000Z ran OK and archived see /home/d03/myosh/cylc-run/u-br356/log/job/20101201T0000Z/postproc/01/job-archive.log, where it says
br356a.da20110101_00 ARCHIVE OK
atmos_main in Cylce 20110101T0000Z failed but postproc in 20110101T0000Z succeeded and deleted th e dump thinking it was no longer needed see /home/d03/myosh/cylc-run/u-br356/log/job/20110101T0000Z/postproc/01/job.out
[INFO] Running do_delete for atmos…
[INFO] Removing dump files:
br356a.da20110101_00
How postproc ran but atmos_main failed is odd and would indicate a badly formed dependency graph.
Grenville