Opened 5 years ago
Closed 5 years ago
#1821 closed help (fixed)
Segmentation fault when trying to switch from 360-day to Gregorian calendar for release job vn8.4 RJ4.0 CheST+GLOMAP-mode (xlsjc)
Reported by: | s1374103 | Owned by: | um_support |
---|---|---|---|
Component: | UM Model | Keywords: | |
Cc: | Platform: | MONSooN | |
UM Version: | 8.4 |
Description
Dear CMS,
I have copied a release job, vn8.4 RJ4.0 CheST+GLOMAP-mode (xlsjc), and it has successfully ran (xmmic). I've been told that the nudging code may not work for this particular version so to begin with I am attempting at just swicthing from a 360-day to 365-day calendar run. This has resulted in a Segmentation fault (xmmya).
The changes I have made from my original job (xmmic) to my Gregorian job (xmmya) are;
- Replaced all ancillaries with Gregorian ones
- Unselected 'Use 360 day calendar'
- Edited post processing of PP files to be compatible with Gregorian calendar
- Removed any diagnostics used in STASH which were using the TMMNUKCA time profile
- Selected 'Regular frequency dumps for Gregorian-calendar Meaning'
The error message was;
Application 449043 is crashing. ATP analysis proceeding... ATP Stack walkback for Rank 48 starting: _start@start.S:113 __libc_start_main@libc-start.c:242 flumemain_@flumeMain.f90:48 um_shell_@um_shell.f90:1865 u_model_@u_model.f90:2688 atm_step_@atm_step.f90:8447 ni_sl_thermo_@ni_sl_thermo.f90:713 sl_thermo_@sl_thermo.f90:616 departure_point_@departure_point.f90:386 ritchie_@ritchie.f90:2648 bi_linear_h_@bi_linear_h.f90:398 _cray$mt_execute_parallel_with_proc_bind@0x1d7ee64 _cray$mt_start_one_code_parallel@0x1d7eac9 bi_linear_h__cray$mt$p0001@bi_linear_h.f90:409 ATP Stack walkback for Rank 48 done Process died with signal 11: 'Segmentation fault' Forcing core dumps of ranks 48, 24, 180, 0, 45, 25, 34, 36, 41 atpAppSigHandler timed out waiting for shutdown. Re-raising signal. atpAppSigHandler timed out waiting for shutdown. Re-raising signal. atpAppSigHandler timed out waiting for shutdown. Re-raising signal. atpAppSigHandler timed out waiting for shutdown. Re-raising signal. atpAppSigHandler timed out waiting for shutdown. Re-raising signal. atpAppSigHandler timed out waiting for shutdown. Re-raising signal. atpAppSigHandler timed out waiting for shutdown. Re-raising signal. _pmiu_daemon(SIGCHLD): [NID 00142] [c0-0c2s3n2] [Tue Mar 1 20:10:09 2016] PE RANK 189 exit sign al Segmentation fault [NID 00142] 2016-03-01 20:10:09 Apid 449043: initiated application termination View application merged backtrace tree with: stat-view atpMergedBT.dot You may need to: module load stat xmmyb: Run failed
Further down in the .leave file is an 'NaN';
Minimum theta level 1 for timestep 1 This timestep This run Min theta1 proc position Min theta1 timestep 261.32 468 -112.5deg W 107.5deg N 231.44 1 Largest negative delta theta1 at minimum theta1 This timestep = NaNK. At min for run = -9.34K
I have tried a few variations like forcing full build (xmmyk), not making any changes to STASH (xmmym), added extra diagnostic messages (xmmyb) but all have resulted in segmentation fault. In one variation I selected 'delete superseeded restart dumps' (xmmyh). This did not contain the NaN but still failed with segmentation fault. In fact I think this didn't even run because it exceeded the walltime limit.
Any suggestions for things I could try would be greatly appreciated.
Regards,
Jamie
Change History (2)
comment:1 Changed 5 years ago by s1374103
comment:2 Changed 5 years ago by annette
- Resolution set to fixed
- Status changed from new to closed
Hi
You can close this now as the issue has been resolved.
Regards,
Jamie