Opened 4 years ago

Closed 4 years ago

#1821 closed help (fixed)

Segmentation fault when trying to switch from 360-day to Gregorian calendar for release job vn8.4 RJ4.0 CheST+GLOMAP-mode (xlsjc)

Reported by: s1374103 Owned by: um_support
Component: UM Model Keywords:
Cc: Platform: MONSooN
UM Version: 8.4

Description

Dear CMS,

I have copied a release job, vn8.4 RJ4.0 CheST+GLOMAP-mode (xlsjc), and it has successfully ran (xmmic). I've been told that the nudging code may not work for this particular version so to begin with I am attempting at just swicthing from a 360-day to 365-day calendar run. This has resulted in a Segmentation fault (xmmya).

The changes I have made from my original job (xmmic) to my Gregorian job (xmmya) are;

  1. Replaced all ancillaries with Gregorian ones
  2. Unselected 'Use 360 day calendar'
  3. Edited post processing of PP files to be compatible with Gregorian calendar
  4. Removed any diagnostics used in STASH which were using the TMMNUKCA time profile
  5. Selected 'Regular frequency dumps for Gregorian-calendar Meaning'

The error message was;

Application 449043 is crashing. ATP analysis proceeding...

ATP Stack walkback for Rank 48 starting:
  _start@start.S:113
  __libc_start_main@libc-start.c:242
  flumemain_@flumeMain.f90:48
  um_shell_@um_shell.f90:1865
  u_model_@u_model.f90:2688
  atm_step_@atm_step.f90:8447
  ni_sl_thermo_@ni_sl_thermo.f90:713
  sl_thermo_@sl_thermo.f90:616
  departure_point_@departure_point.f90:386
  ritchie_@ritchie.f90:2648
  bi_linear_h_@bi_linear_h.f90:398
  _cray$mt_execute_parallel_with_proc_bind@0x1d7ee64
  _cray$mt_start_one_code_parallel@0x1d7eac9
  bi_linear_h__cray$mt$p0001@bi_linear_h.f90:409
ATP Stack walkback for Rank 48 done
Process died with signal 11: 'Segmentation fault'
Forcing core dumps of ranks 48, 24, 180, 0, 45, 25, 34, 36, 41
atpAppSigHandler timed out waiting for shutdown. Re-raising signal.
atpAppSigHandler timed out waiting for shutdown. Re-raising signal.
atpAppSigHandler timed out waiting for shutdown. Re-raising signal.
atpAppSigHandler timed out waiting for shutdown. Re-raising signal.
atpAppSigHandler timed out waiting for shutdown. Re-raising signal.
atpAppSigHandler timed out waiting for shutdown. Re-raising signal.
atpAppSigHandler timed out waiting for shutdown. Re-raising signal.
_pmiu_daemon(SIGCHLD): [NID 00142] [c0-0c2s3n2] [Tue Mar  1 20:10:09 2016] PE RANK 189 exit sign
al Segmentation fault
[NID 00142] 2016-03-01 20:10:09 Apid 449043: initiated application termination
View application merged backtrace tree with: stat-view atpMergedBT.dot
You may need to: module load stat

xmmyb: Run failed

Further down in the .leave file is an 'NaN';

 Minimum theta level 1 for timestep  1
                This timestep                         This run
   Min theta1     proc          position            Min theta1 timestep
      261.32     468  -112.5deg W     107.5deg N       231.44     1
  Largest negative delta theta1 at minimum theta1
 This timestep =      NaNK. At min for run =    -9.34K


I have tried a few variations like forcing full build (xmmyk), not making any changes to STASH (xmmym), added extra diagnostic messages (xmmyb) but all have resulted in segmentation fault. In one variation I selected 'delete superseeded restart dumps' (xmmyh). This did not contain the NaN but still failed with segmentation fault. In fact I think this didn't even run because it exceeded the walltime limit.

Any suggestions for things I could try would be greatly appreciated.

Regards,

Jamie

Change History (2)

comment:1 Changed 4 years ago by s1374103

Hi

You can close this now as the issue has been resolved.

Regards,

Jamie

comment:2 Changed 4 years ago by annette

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.