Opened 4 years ago
Closed 4 years ago
#2163 closed help (fixed)
Partial sum file inconsistent
Reported by: | mattjbr123 | Owned by: | um_support |
---|---|---|---|
Component: | UM Model | Keywords: | climate meaning, partial sum |
Cc: | Platform: | Monsoon2 | |
UM Version: | 10.3 |
Description
Hi,
I'm getting the following error when running u-ak617 on MONSooN.
Some background on this suite: To generate initial condition ensembles I'm running a practically identical ensemble suite (u-af404) for one day with stochastic forcing enabled to generate dump files that are all slightly different from one another. I'm then using these dump files to start my ensembles for u-ak617. U-af404 starts at 2008-09-01 00:00:00 and so u-ak617 starts at 2008-09-02 00:00:00 (I have changed the model basis (start) time). I feel something about this setup may be upsetting the climate meaning system?
Weirdly, I get the same error even when I disable climate meaning altogether.
Any suggestions appreciated.
Matt
IO: Open: /home/d04/mabro/cylc-run/u-ak617/share/data/History_Data_r050i1p00000/ak617-r050i1p00000a_s1b on unit 39 ACUMPS1: Partial sum file inconsistent PS file holds 0 items and written at STEP 0 Expected timestep should be < 72 Expected number of items 7 ???????????????????????????????????????????????????????????????????????????????? ???!!!???!!!???!!!???!!!???!!! ERROR ???!!!???!!!???!!!???!!!???!!! ? Error code: 2 ? Error from routine: U_MODEL_4A ? Error message: ACUMPS1: Partial sum file inconsistent. See Output ? Error from processor: 1 ? Error number: 6 ????????????????????????????????????????????????????????????????????????????????
Change History (13)
comment:1 Changed 4 years ago by mattjbr123
comment:2 Changed 4 years ago by mattjbr123
When I disable all the 7 STASH UPMEAN outputs and set 'um → namelist → Model Input and Output → Dumping and Meaning → l_meaning_sequence' to false (the master switch for climate meaning as far as I can tell), I still basically get the same error, just with '0 items expected' instead of 7…
Period_1 data read from: 1a Period_1 data written to: 1b FILE_MANAGER: Assigned : 1a FILE_MANAGER: : Unit : 37 (portio) FILE_MANAGER: Assigned : 1b FILE_MANAGER: : Unit : 38 (portio) IO: Switching file mode to local because there is no IO server IO: Opening unit 37 with collective(broadcast) semantics IO: Checking consistency of unit open request... IO: Valid request IO: Open: 1a on unit 37 IO: Switching file mode to local because there is no IO server IO: Opening unit 38 with collective(broadcast) semantics IO: Checking consistency of unit open request... IO: Valid request IO: Open: 1b on unit 38 ACUMPS1: Partial sum file inconsistent PS file holds -4576640146148950016 items and written at STEP 24780986340343808 Expected timestep should be < 72 Expected number of items 0 ???????????????????????????????????????????????????????????????????????????????? ???!!!???!!!???!!!???!!!???!!! ERROR ???!!!???!!!???!!!???!!!???!!! ? Error code: 2 ? Error from routine: U_MODEL_4A ? Error message: ACUMPS1: Partial sum file inconsistent. See Output ? Error from processor: 1 ? Error number: 6 ????????????????????????????????????????????????????????????????????????????????
So basically the same, but with expected no. of items 0…
Matt
comment:3 Changed 4 years ago by grenville
Matt
It appears that you are running this now — it's difficult to know how to proceed with a live and possibly changing job. Does the problem persist — if so please copy suite for further investigation so that we aren't hindering each other.
Grenville
comment:4 Changed 4 years ago by mattjbr123
Hmm, it's not showing up with cylc scan, and I killed it immediately after getting the above error, although I did still have rose edit open. But I have copied the suite to u-am020 nonetheless.
Matt
comment:5 Changed 4 years ago by mattjbr123
Just a thought that occured to me: Could it be looking to continue the partial sum from the 1-day run (u-af404) as it would for a CRUN, but that data isn’t there anymore as it's a different suite?
comment:6 Changed 4 years ago by grenville
Matt
I took a copy of u-ak617 but it fails to build -
error: process optcg used more than 2560000kB of memory on node shared100
error: job terminated
Have you built this recently?
Grenville
comment:7 Changed 4 years ago by mattjbr123
Hi Grenville,
Yes, but I have also had this error before. I simply reran it without changing anything and it worked. Not sure what the issue is there.
Matt
comment:8 Changed 4 years ago by grenville
Ahh - OK same here — that's odd
comment:9 Changed 4 years ago by grenville
Matt
When you switch off climate meaning, it's not being done quite right (this a Gregorian calendar issue). In u-ak617/app/um/rose-app.conf, you'll see
meanfreqim=3,3,4,10
but it should say
!!meanfreqim=3,3,4,10
So the triggers aren't quite right.
Fixing this fixes the problem w/l_meaning_sequence=.false. I've not investigated case with l_meaning_sequence=.true.
Grenville
comment:10 Changed 4 years ago by mattjbr123
Hi Grenville,
This doesn't seem to be the case for me. When I switch l_meaning_sequence to false in rose edit, save, and then look at app/um/rose-app.conf I see !!meanfreqim=3,3,4,10, i.e. how it should be.
I will give the run another shot though, just in case.
If this doesn't fix the error I may try running u-af404 again with l_meaning_sequence false (it wasn't before, even though I had disable all the UPMEAN outputs in STASH), before using the outputted dump files to set off u-ak617, in case having it enabled in u-af404 means that in u-ak617 it is then expecting to see partial sum files etc. which won't exist as it is a different suite.
I will report back…
Matt
comment:11 Changed 4 years ago by mattjbr123
Actually, did you mean app/um/opt/rose-app-gregorian.conf has the enabled meanfreqim when it shouldn't be? Because I DO see that. I will try !!'ing it.
Matt
comment:12 Changed 4 years ago by mattjbr123
And that seems to have fixed things
comment:13 Changed 4 years ago by grenville
- Resolution set to fixed
- Status changed from new to closed
Also, a quick du shows the size of the partial sum files s1a, s1b to be 0.