Opened 11 years ago

Closed 11 years ago

#329 closed help (fixed)

qsexecute error at point of job resubmission

Reported by: mx020105 Owned by: willie
Component: UM Model Keywords: halo_j, bi_linear_h
Cc: Platform:
UM Version: 6.1

Description

Hello Helpdesk,

I have compiled and submitted a UM job (mx020105 - xdwsf) which seemingly ran successfully for its first couple of re-submissions but which now shows Segmentation Fault errors at the qsexecute stage of the re-submission process. It thus fails to run and aborts. I am running another virtually identical job in parallel (xdwse) which doesn't seem to be having the same problem. The only differences between the two jobs are in the use of one particular mod (xdwse has the mod /home/n02/n02/mx020105/am.mf77 and xdwsf has /home/n02/n02/mx020105/am_high.mf77) but I can't see that the mod should affect this. I'm struggling to work out why it is failing now when it ran successfully for the first couple of re-submissions.

An example of a .leave file for one of the successful submission times is:
/home/n02/n02/mx020105/um/umui_out/xdwsf003.xdwsf.d09283.t135004.leave

and the subsequent failed resubmission is:
xdwsf004.xdwsf.d09283.t235111.leave

I have tried simply resubmitting the same job without saving/processing it etc. but find now the same error at qsexecute when I do this.

Any help would be much appreciated.

Many thanks,
Amanda Maycock

Change History (2)

comment:1 Changed 11 years ago by willie

  • Keywords halo_j, bi_linear_h added
  • Owner changed from um_support to willie
  • Status changed from new to accepted

Hi Amanda,

I've had a quick look. These jobs run for a large number of time steps before failing. The last thing in the .leave file is a warning "overwriting due to bi_linear_h". Another user had this problem due to not configuring the land-sea mask but this may be a red herring. Before this there is "error halo_j too small 4" (in both runs), so the first thing to do is try a run with the halo sizes set to 5 on UMUI page Atmos > Domain > Horizontal.

Let me know if that works.

regards,

Willie


comment:2 Changed 11 years ago by willie

  • Resolution set to fixed
  • Status changed from accepted to closed
Note: See TracTickets for help on using tickets.