Opened 3 years ago

Closed 3 years ago

#2163 closed help (fixed)

Partial sum file inconsistent

Reported by: mattjbr123 Owned by: um_support
Component: UM Model Keywords: climate meaning, partial sum
Cc: Platform: Monsoon2
UM Version: 10.3

Description

Hi,

I'm getting the following error when running u-ak617 on MONSooN.
Some background on this suite: To generate initial condition ensembles I'm running a practically identical ensemble suite (u-af404) for one day with stochastic forcing enabled to generate dump files that are all slightly different from one another. I'm then using these dump files to start my ensembles for u-ak617. U-af404 starts at 2008-09-01 00:00:00 and so u-ak617 starts at 2008-09-02 00:00:00 (I have changed the model basis (start) time). I feel something about this setup may be upsetting the climate meaning system?
Weirdly, I get the same error even when I disable climate meaning altogether.
Any suggestions appreciated.
Matt

IO: Open: /home/d04/mabro/cylc-run/u-ak617/share/data/History_Data_r050i1p00000/ak617-r050i1p00000a_s1b on unit  39
ACUMPS1: Partial sum file inconsistent
PS file holds  0  items and written at STEP  0
Expected timestep should be <  72
Expected number of items  7

????????????????????????????????????????????????????????????????????????????????
???!!!???!!!???!!!???!!!???!!!       ERROR        ???!!!???!!!???!!!???!!!???!!!
?  Error code: 2
?  Error from routine: U_MODEL_4A
?  Error message: ACUMPS1: Partial sum file inconsistent. See Output
?  Error from processor: 1
?  Error number: 6
????????????????????????????????????????????????????????????????????????????????

Change History (13)

comment:1 Changed 3 years ago by mattjbr123

Also, a quick du shows the size of the partial sum files s1a, s1b to be 0.

comment:2 Changed 3 years ago by mattjbr123

When I disable all the 7 STASH UPMEAN outputs and set 'um → namelist → Model Input and Output → Dumping and Meaning → l_meaning_sequence' to false (the master switch for climate meaning as far as I can tell), I still basically get the same error, just with '0 items expected' instead of 7…

Period_1 data read from: 1a
Period_1 data written to: 1b
FILE_MANAGER: Assigned : 1a
FILE_MANAGER:          : Unit :  37 (portio)
FILE_MANAGER: Assigned : 1b
FILE_MANAGER:          : Unit :  38 (portio)
IO: Switching file mode to local because there is no IO server
IO: Opening unit  37 with collective(broadcast) semantics
IO: Checking consistency of unit open request...
IO: Valid request
IO: Open: 1a on unit  37
IO: Switching file mode to local because there is no IO server
IO: Opening unit  38 with collective(broadcast) semantics
IO: Checking consistency of unit open request...
IO: Valid request
IO: Open: 1b on unit  38
ACUMPS1: Partial sum file inconsistent
PS file holds  -4576640146148950016  items and written at STEP  24780986340343808
Expected timestep should be <  72
Expected number of items  0

????????????????????????????????????????????????????????????????????????????????
???!!!???!!!???!!!???!!!???!!!       ERROR        ???!!!???!!!???!!!???!!!???!!!
?  Error code: 2
?  Error from routine: U_MODEL_4A
?  Error message: ACUMPS1: Partial sum file inconsistent. See Output
?  Error from processor: 1
?  Error number: 6
????????????????????????????????????????????????????????????????????????????????

So basically the same, but with expected no. of items 0…

Matt

comment:3 Changed 3 years ago by grenville

Matt

It appears that you are running this now — it's difficult to know how to proceed with a live and possibly changing job. Does the problem persist — if so please copy suite for further investigation so that we aren't hindering each other.

Grenville

comment:4 Changed 3 years ago by mattjbr123

Hmm, it's not showing up with cylc scan, and I killed it immediately after getting the above error, although I did still have rose edit open. But I have copied the suite to u-am020 nonetheless.

Matt

comment:5 Changed 3 years ago by mattjbr123

Just a thought that occured to me: Could it be looking to continue the partial sum from the 1-day run (u-af404) as it would for a CRUN, but that data isn’t there anymore as it's a different suite?

comment:6 Changed 3 years ago by grenville

Matt

I took a copy of u-ak617 but it fails to build -

error: process optcg used more than 2560000kB of memory on node shared100
error: job terminated

Have you built this recently?

Grenville

comment:7 Changed 3 years ago by mattjbr123

Hi Grenville,

Yes, but I have also had this error before. I simply reran it without changing anything and it worked. Not sure what the issue is there.

Matt

comment:8 Changed 3 years ago by grenville

Ahh - OK same here — that's odd

comment:9 Changed 3 years ago by grenville

Matt

When you switch off climate meaning, it's not being done quite right (this a Gregorian calendar issue). In u-ak617/app/um/rose-app.conf, you'll see

meanfreqim=3,3,4,10

but it should say

!!meanfreqim=3,3,4,10

So the triggers aren't quite right.

Fixing this fixes the problem w/l_meaning_sequence=.false. I've not investigated case with l_meaning_sequence=.true.

Grenville

comment:10 Changed 3 years ago by mattjbr123

Hi Grenville,

This doesn't seem to be the case for me. When I switch l_meaning_sequence to false in rose edit, save, and then look at app/um/rose-app.conf I see !!meanfreqim=3,3,4,10, i.e. how it should be.

I will give the run another shot though, just in case.

If this doesn't fix the error I may try running u-af404 again with l_meaning_sequence false (it wasn't before, even though I had disable all the UPMEAN outputs in STASH), before using the outputted dump files to set off u-ak617, in case having it enabled in u-af404 means that in u-ak617 it is then expecting to see partial sum files etc. which won't exist as it is a different suite.

I will report back…

Matt

comment:11 Changed 3 years ago by mattjbr123

Actually, did you mean app/um/opt/rose-app-gregorian.conf has the enabled meanfreqim when it shouldn't be? Because I DO see that. I will try !!'ing it.

Matt

comment:12 Changed 3 years ago by mattjbr123

And that seems to have fixed things :D

comment:13 Changed 3 years ago by grenville

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.