Opened 5 years ago

Closed 5 years ago

#1302 closed help (answered)

Job failure

Reported by: sjbush Owned by: um_support
Component: UM Model Keywords:
Cc: Platform: ARCHER
UM Version: 8.5

Description

Hello,

My job xjxja just failed again. I don't recognise the error, but I'm wondering if it is a memory error again. The output is in ~sjbush/output/xjxja000.xjxja.d14143.t152753.leave on archer.

Cheers,
Stephanie

Change History (2)

comment:1 Changed 5 years ago by grenville

Stephanie

The problem causing the crash is

Error Message: COEX: Unable to WGDOS pack to this accuracy

this can happen if it's trying to pack a NaN for example - or some number which the packing profile can't handle. The model seems to be running OK just prior to the crash, but it's difficult to tell. You could turn on some more diagnostic printing to see if the model is blowing up.

I note that you have 90 day dumping with climate meaning 3,3,4,10 - so the first mean will be a 3x90 -day mean. I don't that this would cause the problem you are seeing, but please check it's what you want.

It's always a good idea to set ATP_ENABLED=1 in the UMUI - this will provide a stack-trace when the model fails.

It may be worth starting again with 10 day dumping - maybe your previous OOM error is related to 90 day dumping also.

Grenville

comment:2 Changed 5 years ago by grenville

  • Resolution set to answered
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.