Opened 10 years ago

Closed 10 years ago

#539 closed help (fixed)

Job crashing on before first time step

Reported by: keeley Owned by: um_support
Component: UM Model Keywords:
Cc: Platform:
UM Version: 6.1

Description

Sorry to bother you again…

My model is crashing and I really can't identify the cause:
/home/n02/n02/keeley/um/umui_out/xfiva000.xfiva.d10316.t153041.leave

Starting script : qsexecute
Starting time : Fri Nov 12 15:40:12 GMT 2010

*

/work/n02/n02/keeley/tmp/tmp.hector-xt6-14.11778/modscr_xfiva/qsexecute: Executing setup

/work/n02/n02/hum/vn6.1/pathscale_quad/scripts/qssetup: Job terminated normally

xfiva: Starting run

_pmii_daemon(SIGCHLD): PE 16 exit signal Segmentation fault

_pmii_daemon(SIGCHLD): PE 96 exit signal Segmentation fault

_pmii_daemon(SIGCHLD): PE 32 exit signal Segmentation fault

[NID 00068] 2010-11-12 15:40:33 Apid 249490: initiated application termination

_pmii_daemon(SIGCHLD): PE 80 exit signal Segmentation fault
diff: /work/n02/n02/keeley/tmp/tmp.hector-xt6-14.11778/xfiva.xhist:
No such file or directory

qsexecute: Copying /work/n02/n02/keeley/um/xfiva/dataw/xfiva.thist to backup thist file /work/n02/n02/keeley/um/xfiva/dataw/xfiva.thist_keep
xfiva: Run failed

*

Ending script : qsexecute
Completion code : 137
Completion time : Fri Nov 12 15:40:36 GMT 2010

*

The other weird thing at the bottom of the .leave file is it thinks the model data time is 1979 3 1
but I have checked the umui and I have defintely set it as 1979 12 1

If you have any insight into to this I would be very grateful!

Thanks,
Sarah

Change History (7)

comment:1 follow-up: Changed 10 years ago by willie

Hi Sarah

You have set the start date to 1979/12/1 in Submodel Indep > Start date. I would set this all to zero so that it uses the time in the start dump.

You have a segmentation violation. Possibly due to a modset. One thing to do is to try to compile the code with the array bounds checking option on (-C).

Regards,

Willie

comment:2 follow-ups: Changed 10 years ago by jeff

Hi Sarah

It's hard to say what has gone wrong as the model has crashed and any output hasn't been written. You could try including this mod /home/n02/n02/jwc/um/vn6.1/mods/debug.mf77, this puts in a flush call to flush the output buffers. If there still isn't any useful output try again with the last 2 lines uncommented.

Jeff.

comment:3 in reply to: ↑ 2 Changed 10 years ago by keeley

Hi Jeff,
Even when I add the modset in with last 2 lines uncommented I can't seem to get anymore information and there is nothing on the individual .pe# files they are all empty.

Is there anything else obvious I can do?

Sarah

Replying to jeff:

Hi Sarah

It's hard to say what has gone wrong as the model has crashed and any output hasn't been written. You could try including this mod /home/n02/n02/jwc/um/vn6.1/mods/debug.mf77, this puts in a flush call to flush the output buffers. If there still isn't any useful output try again with the last 2 lines uncommented.

Jeff.

comment:4 in reply to: ↑ 1 Changed 10 years ago by keeley

Hi Willie,
I have tried setting it all to zero and that doesn't seem to fix anything.
I am still getting the strange 1979/3/1 as the model time in the .leave file which I am unsure about.

I am also not sure how to change the compile options…
thanks,
Sarah

Replying to willie:

Hi Sarah

You have set the start date to 1979/12/1 in Submodel Indep > Start date. I would set this all to zero so that it uses the time in the start dump.

You have a segmentation violation. Possibly due to a modset. One thing to do is to try to compile the code with the array bounds checking option on (-C).

Regards,

Willie

comment:5 in reply to: ↑ 2 Changed 10 years ago by keeley

Hi Jeff,
I have recompiled and run with the modset debug.mf77
~keeley/um/umui_out/xfiva000.xfiva.d10320.t110342.leave

I have also tried running with all the extra dust mods that margaret runs with and turning off all but 8 variables in the STASH (xfivc) and the model still crashes on the second timestep.
~keeley/um/umui_out/xfivc000.xfivc.d10320.t174809.leave

I am not sure where to hunt for possible errors now - do you have any suggestions on what to do next?

Thanks,
Sarah

Replying to jeff:

Hi Sarah

It's hard to say what has gone wrong as the model has crashed and any output hasn't been written. You could try including this mod /home/n02/n02/jwc/um/vn6.1/mods/debug.mf77, this puts in a flush call to flush the output buffers. If there still isn't any useful output try again with the last 2 lines uncommented.

Jeff.

comment:6 Changed 10 years ago by willie

  • UM Version changed from <select version> to 6.1

Hi Sarah,

Job xfiva is producing the message,

RHS zero so GCR( 2 ) not needed 

This indicates instability. You could try reducing the time step.

Regards,

Willie

comment:7 Changed 10 years ago by willie

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.