Opened 3 years ago

Closed 3 years ago

#1848 closed help (answered)

Model write error

Reported by: michmcr Owned by: annette
Component: UM Model Keywords:
Cc: Platform:
UM Version: 8.4

Description

My model xlelm is running on ARCHER, however on Sunday night (Mar 27th) it failed to write out data for roughly two months (October and November in 2013) and yet continued to run and then write out files after this point (for 2014) until it ran out of time. Would you have any idea as to why the model would fail to write some files but continue running and then write other files after this?

Thanks
Michelle

Change History (2)

comment:1 Changed 3 years ago by annette

  • Owner changed from um_support to annette
  • Status changed from new to assigned

Hi Michelle,

There appear to be a lot of errors in the log files:

/home/n02/n02/michmcr/output

It looks like it is sometimes timing out eg:

xlelm000.xlelm.d16082.t132502.leave

which may be why it hasn't written all of the output files you'd expect.

Then there are some errors to do with inconsistent partial sum files:

xlelm000.xlelm.d16089.t110535.leave

which could be because it has not written the files properly.

I don't know why it has kept resubmitting itself even though there are errors though.

I think you should stop the run, then start an NRUN from a point you are happy with. You will need to change the start date and start dump.

You should also reduce the run length for each resubmission period so that it can finish fully within the 48 hour period.

Also check the log files to see that there aren't any further errors before setting of the CRUN, as there may be something else going on.

Let me know if you need any further advice with this.

Annette

comment:2 Changed 3 years ago by grenville

  • Resolution set to answered
  • Status changed from assigned to closed

closed for lack of activity

Note: See TracTickets for help on using tickets.