Opened 4 years ago

Closed 4 years ago

#1512 closed help (answered)

help with diagnosing error in leave file?

Reported by: swr04ojb Owned by: um_support
Component: UM Model Keywords:
Cc: Platform: MONSooN
UM Version: 6.6.3

Description

Hi,

a simulation of mine (xinh#g) on MONSooN seems to have crashed with an error that I don't recognise. I note MASS was supposed to be down this morning (09:30-12:00), but only for get commands, not put ones*, and this error looks like it occurs earlier than that anyway. So, any idea what's going on?

Message id 8308 from task 61 (ISR 1048) to task 12 (ISR 9) timed out.
epoch_ready=1 msg_len=8 hdr_len=24 msg_type=19 hndlr_idx=6
Last progress made at time 1941 s. Current time 2842 s.
ERROR 1 from file ../../../../../../src/ppe/lapi/Sam.cpp line 1059
Sam::CheckTimeout TIMEOUT happened
ERROR: 0031-300  Forcing all remote tasks to exit due to exit code 1 in task 61
xinhg: Run failed

*http://collab.metoffice.gov.uk/twiki/bin/view/Support/LatestNews

Change History (2)

comment:1 Changed 4 years ago by willie

Hi Oliver,

I couldn't see anything obviously wrong in your leave file or pe_output. The IBM manual says this

0031-300

Forcing all remote tasks to exit due to exit code 1 in task number

Explanation
POE has been informed that the indicated task has exited with an exit code of 1, which causes POE to force all remote tasks to exit.

User response
If the user program is expected to issue an exit(1) as a means of aborting a job which had encountered an error, then no response is required. Otherwise, gather information about the problem and follow local site procedures for reporting hardware and software problems, as an internal error may have occurred.

So it looks like you've hit an exit unexpectedly in your code.

Regards

Willie

comment:2 Changed 4 years ago by grenville

  • Resolution set to answered
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.