Opened 10 years ago
Closed 10 years ago
#539 closed help (fixed)
Job crashing on before first time step
Reported by: | keeley | Owned by: | um_support |
---|---|---|---|
Component: | UM Model | Keywords: | |
Cc: | Platform: | ||
UM Version: | 6.1 |
Description
Sorry to bother you again…
My model is crashing and I really can't identify the cause:
/home/n02/n02/keeley/um/umui_out/xfiva000.xfiva.d10316.t153041.leave
Starting script : qsexecute
Starting time : Fri Nov 12 15:40:12 GMT 2010
*
/work/n02/n02/keeley/tmp/tmp.hector-xt6-14.11778/modscr_xfiva/qsexecute: Executing setup
/work/n02/n02/hum/vn6.1/pathscale_quad/scripts/qssetup: Job terminated normally
xfiva: Starting run
_pmii_daemon(SIGCHLD): PE 16 exit signal Segmentation fault
_pmii_daemon(SIGCHLD): PE 96 exit signal Segmentation fault
_pmii_daemon(SIGCHLD): PE 32 exit signal Segmentation fault
[NID 00068] 2010-11-12 15:40:33 Apid 249490: initiated application termination
_pmii_daemon(SIGCHLD): PE 80 exit signal Segmentation fault
diff: /work/n02/n02/keeley/tmp/tmp.hector-xt6-14.11778/xfiva.xhist:
No such file or directory
qsexecute: Copying /work/n02/n02/keeley/um/xfiva/dataw/xfiva.thist to backup thist file /work/n02/n02/keeley/um/xfiva/dataw/xfiva.thist_keep
xfiva: Run failed
*
Ending script : qsexecute
Completion code : 137
Completion time : Fri Nov 12 15:40:36 GMT 2010
*
The other weird thing at the bottom of the .leave file is it thinks the model data time is 1979 3 1
but I have checked the umui and I have defintely set it as 1979 12 1
If you have any insight into to this I would be very grateful!
Thanks,
Sarah
Change History (7)
comment:1 follow-up: ↓ 4 Changed 10 years ago by willie
comment:2 follow-ups: ↓ 3 ↓ 5 Changed 10 years ago by jeff
Hi Sarah
It's hard to say what has gone wrong as the model has crashed and any output hasn't been written. You could try including this mod /home/n02/n02/jwc/um/vn6.1/mods/debug.mf77, this puts in a flush call to flush the output buffers. If there still isn't any useful output try again with the last 2 lines uncommented.
Jeff.
comment:3 in reply to: ↑ 2 Changed 10 years ago by keeley
Hi Jeff,
Even when I add the modset in with last 2 lines uncommented I can't seem to get anymore information and there is nothing on the individual .pe# files they are all empty.
Is there anything else obvious I can do?
Sarah
Replying to jeff:
Hi Sarah
It's hard to say what has gone wrong as the model has crashed and any output hasn't been written. You could try including this mod /home/n02/n02/jwc/um/vn6.1/mods/debug.mf77, this puts in a flush call to flush the output buffers. If there still isn't any useful output try again with the last 2 lines uncommented.
Jeff.
comment:4 in reply to: ↑ 1 Changed 10 years ago by keeley
Hi Willie,
I have tried setting it all to zero and that doesn't seem to fix anything.
I am still getting the strange 1979/3/1 as the model time in the .leave file which I am unsure about.
I am also not sure how to change the compile options…
thanks,
Sarah
Replying to willie:
Hi Sarah
You have set the start date to 1979/12/1 in Submodel Indep > Start date. I would set this all to zero so that it uses the time in the start dump.
You have a segmentation violation. Possibly due to a modset. One thing to do is to try to compile the code with the array bounds checking option on (-C).
Regards,
Willie
comment:5 in reply to: ↑ 2 Changed 10 years ago by keeley
Hi Jeff,
I have recompiled and run with the modset debug.mf77
~keeley/um/umui_out/xfiva000.xfiva.d10320.t110342.leave
I have also tried running with all the extra dust mods that margaret runs with and turning off all but 8 variables in the STASH (xfivc) and the model still crashes on the second timestep.
~keeley/um/umui_out/xfivc000.xfivc.d10320.t174809.leave
I am not sure where to hunt for possible errors now - do you have any suggestions on what to do next?
Thanks,
Sarah
Replying to jeff:
Hi Sarah
It's hard to say what has gone wrong as the model has crashed and any output hasn't been written. You could try including this mod /home/n02/n02/jwc/um/vn6.1/mods/debug.mf77, this puts in a flush call to flush the output buffers. If there still isn't any useful output try again with the last 2 lines uncommented.
Jeff.
comment:6 Changed 10 years ago by willie
- UM Version changed from <select version> to 6.1
Hi Sarah,
Job xfiva is producing the message,
RHS zero so GCR( 2 ) not needed
This indicates instability. You could try reducing the time step.
Regards,
Willie
comment:7 Changed 10 years ago by willie
- Resolution set to fixed
- Status changed from new to closed
Hi Sarah
You have a segmentation violation. Possibly due to a modset. One thing to do is to try to compile the code with the array bounds checking option on (-C).
Regards,
Willie