Opened 8 years ago

Closed 8 years ago

#1073 closed help (fixed)

Run failed, Segmentation fault, Failed in qsexecute

Reported by: df019697 Owned by: willie
Component: UM Model Keywords:
Cc: Platform: HECToR
UM Version: 6.1



I have 10 different experiments. 5 with one SST and different atmospheric initial conditions and 5 with a different SST and the same 5 amtospheric initial conditions.

9 of these experiments worked fine however the combination of one SST file and one atmospheric initial condition doesn't want to work…

The leave file is /home/n02/n02/df019697/um/umui_out/xioif000.xioif.d13150.t225257.leave

But the error appears in *

Starting script : qsexecute
Starting time : Fri May 31 00:41:53 UTC 2013


/work/n02/n02/df019697/tmp/tmp.hector-xe6-13.19318/modscr_xioif/qsexecute: Executing setup


/work/n02/n02/hum/vn6.1/cce/scripts/qssetup: Job terminated normally
xioif: Starting run
_pmiu_daemon(SIGCHLD): [NID 01242] [c12-1c1s2n0] [Fri May 31 01:31:26 2013] PE RANK 48 exit signal Segmentation fault
[NID 01242] 2013-05-31 01:31:26 Apid 4708934: initiated application termination
_pmiu_daemon(SIGCHLD): [NID 01727] [c13-0c2s0n3] [Fri May 31 01:31:26 2013] PE RANK 160 exit signal Segmentation fault
_pmiu_daemon(SIGCHLD): [NID 01281] [c12-0c2s0n1] [Fri May 31 01:31:26 2013] PE RANK 240 exit signal Segmentation fault
qsexecute: Copying /work/n02/n02/df019697/xioif/xioif.thist to backup thist file /work/n02/n02/df019697/xioif/xioif.thist_keep
xioif: Run failed

Ending script : qsexecute
Completion code : 137
Completion time : Fri May 31 01:31:33 UTC 2013


Not sure how to debug this especially as the other experiments were fine.
The experiment is xioif. The other experiments are xing* and xioi*. Doing a difference shows no unexpected differences.


Change History (2)

comment:1 Changed 8 years ago by willie

  • Owner changed from um_support to willie
  • Status changed from new to accepted

Hi Ray,

I suspect that this is a stability problem. To find out, you need to repeat the run with more debug options,

  • In Output Choices, select subroutine timer diagnostics,
  • in scientific sections, section 13 , push DIAG_PRN and change the printing frequency from 24 to 1 and the 0.4 to 10.0,
  • add the modset $UMDIR/vn6.1/mods/flush.mf77



comment:2 Changed 8 years ago by willie

  • Resolution set to fixed
  • Status changed from accepted to closed
Note: See TracTickets for help on using tickets.