Opened 6 months ago

Closed 6 months ago

#3180 closed help

checksum failure in climate mean

Reported by: ggxmy Owned by: um_support
Component: UM Model Keywords:
Cc: Platform: Monsoon2
UM Version: 11.1

Description

Dear CMS,

atmos_main failed in one of my UKESM-A nudged simulations with an error message;

ERROR: checksum failure in climate mean
Section  0  item  2
This can be due to invalid values in field, or corruption of partial sum file
Remove or fix diagnostic, and rerun

???!!!???!!!???!!!???!!!???!!!       ERROR        ???!!!???!!!???!!!???!!!???!!!
?  Error code: 4
?  Error from routine: U_MODEL_4A
?  Error message: ACUMPS: Diagnostic error. See output for item no.
?  Error from processor: 0
?  Error number: 688

"Section 0 item 2" is the u wind but maybe it is simply the first diagnostic and could be anything?

I'm currently running 3 almost identical simulations, all of which started from Mohit's release job u-bm242, and the only difference between 3 suites is the low level anthropogenic SO2 emissions. u-br297 uses the regular emission. u-br357 uses the same emission on land and emission from sea surface is masked and set to zero. This one, u-br356, uses the same emission on land and emission from sea surface is set to 20% of the original.

diff -r u-br297 u-br356

doesn't seem to show any substantial difference between two suites except the emission ancillary files.

All of these suites ran for 16 months without a problem. But soon after I extended the runs only this suite crashed. So to me this is strange.

Anyway I turned off the request to 00-002 and restarted the run. Now it failed with this;

?  Error code: 1
?  Error from routine: io:file_open
?  Error message: Failed to open file /home/d03/myosh/cylc-run/u-br356/share/data/History_Data/br356a.da20110101_00
?  Error from processor: 0
?  Error number: 22

br356a.da20110101_00 does not exist indeed. I could try other things but I thought maybe its better to seek help before the situation gets more complicated. Please could I have an advice on this?

Thanks,
Masaru

Change History (2)

comment:1 Changed 6 months ago by grenville

Masaru

Cylce 20101201T0000Z ran OK and archived see /home/d03/myosh/cylc-run/u-br356/log/job/20101201T0000Z/postproc/01/job-archive.log, where it says
br356a.da20110101_00 ARCHIVE OK

atmos_main in Cylce 20110101T0000Z failed but postproc in 20110101T0000Z succeeded and deleted th e dump thinking it was no longer needed see /home/d03/myosh/cylc-run/u-br356/log/job/20110101T0000Z/postproc/01/job.out

[INFO] Running do_delete for atmos…
[INFO] Removing dump files:
br356a.da20110101_00

How postproc ran but atmos_main failed is odd and would indicate a badly formed dependency graph.

Grenville

comment:2 Changed 6 months ago by ggxmy

  • Status changed from new to closed

Thank you Grenville.

as you say br356a.da20110101_00 is not present locally any more but has been archived. Would you suggest to start a new run from 20110101 using this as start dump? Well, instead of being waiting for your reply I tried doing this and its been running overnight. so I'm closing this ticket.

Masaru

Note: See TracTickets for help on using tickets.