Opened 6 years ago

Closed 6 years ago

#1254 closed error (fixed)

Unexpected Segmentation violation error

Reported by: mx020105 Owned by: um_support
Component: UM Model Keywords:
Cc: Platform: MONSooN
UM Version: 7.3

Description

Dear CMS,

I am getting an unexpected Seg Violation error in my vn7.3 UM-UKCA experiment (job ID xjrcb) after ~30 days of running.

Signal received: SIGSEGV - Segmentation violation

Traceback:

Offset 0x00002a44 in procedure interpolation_, near line 984 in file /projects/ukca/acmayc/um/xjrcb/ummodel/ppsrc/UM/atmosphere/dynamics_advection/interpolation.f90
Offset 0x00003820 in procedure ritchie_, near line 1122 in file /projects/ukca/acmayc/um/xjrcb/ummodel/ppsrc/UM/atmosphere/dynamics_advection/ritchie.f90
Offset 0x0000084c in procedure departure_point_, near line 372 in file /projects/ukca/acmayc/um/xjrcb/ummodel/ppsrc/UM/atmosphere/dynamics_advection/departure_point.f90
Offset 0x000073d0 in procedure sl_vector_u_, near line 536 in file /projects/ukca/acmayc/um/xjrcb/ummodel/ppsrc/UM/atmosphere/dynamics_advection/sl_vector_u.f90
Offset 0x000124e8 in procedure sl_full_wind_, near line 2288 in file /projects/ukca/acmayc/um/xjrcb/ummodel/ppsrc/UM/atmosphere/dynamics_advection/sl_full_wind.f90
Offset 0x000025f8 in procedure ni_sl_full_wind_, near line 768 in file /projects/ukca/acmayc/um/xjrcb/ummodel/ppsrc/UM/atmosphere/dynamics_advection/ni_sl_full_wind.f90
Offset 0x00018aa4 in procedure atm_step_, near line 10305 in file /projects/ukca/acmayc/um/xjrcb/ummodel/ppsrc/UM/control/top_level/atm_step.f90
Offset 0x0007eec0 in procedure u_model_, near line 5066 in file /projects/ukca/acmayc/um/xjrcb/ummodel/ppsrc/UM/control/top_level/u_model.f90
Offset 0x00001eb0 in procedure um_shell_, near line 3816 in file /projects/ukca/acmayc/um/xjrcb/ummodel/ppsrc/UM/control/top_level/um_shell.f90
Offset 0x00000090 in procedure flumemain, near line 36 in file /projects/ukca/acmayc/um/xjrcb/ummodel/ppsrc/UM/control/top_level/flumeMain.f90
—- End of call chain —-

ERROR: 0031-300 Forcing all remote tasks to exit due to exit code 1 in task 5
qsexecute: Copying /projects/ukca/acmayc/um/xjrcb/xjrcb.thist to backup thist file /projects/ukca/acmayc/um/xjrcb/xjrcb.thist_keep
xjrcb: Run failed

I have made some changes to the code in the glue_rad routine, but these aren't obviously related to the error message above. Any advice you can offer would be greatly appreciated. I attach the .leave file here for reference.

Thanks
Amanda

Change History (5)

comment:1 Changed 6 years ago by mx020105

Sorry the .leave file is actually too large to upload. You can find it here on MONSooN: /home/acmayc/output/xjrcb000.xjrcb.d14076.t235827.leave

Amanda

comment:2 Changed 6 years ago by willie

Hi Amanda,

Your run has failed to converge at time step 2012. You could try running it again with the time step halved.

Regards

Willie

comment:3 Changed 6 years ago by mx020105

Hi Willie,

Thanks for the comment. It looks like the model is becoming unstable in the upper atmosphere and this is why it's not converging, so do you think halving the timestep is likely to help with such an issue?

Thanks,
Amanda

comment:4 Changed 6 years ago by willie

Hi Amanda,

Whenever the model fails to converge in this manner,

GCR( 2 ) failed to converge in  100  iterations.

for whatever reason, it is always worth halving the time step.

Regards

Willie

comment:5 Changed 6 years ago by willie

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.