Opened 7 years ago

Closed 6 years ago

#1117 closed help (fixed)

Model hangs after one month

Reported by: ggcdcb Owned by: simon
Component: UM Model Keywords:
Cc: Platform: PUMA
UM Version: 4.5

Description

Hi,

I am running HadCM3L with MOSES 2.1/TRIFFID for a paleo study. I started with a model setup which ran perfectly. The only thing that I have changed is that I took all of the continental ice off Antarctica and replaced it with shrubs. The model will run for exactly one month and then stops (atmosphere writes out 48 timesteps, ocean writes out 24).

I have tried altering the coupling period between the atmosphere and ocean model, altering the length of the timesteps, turning TRIFFID off, altering the reconfiguration settings, altering the PFT that I change the ice to, smoothing the topography. None seem to make any difference to when the model crashes. There are no obvious grid cells behaving badly in any of the output files.

The error in the .leave files is:

⇒> PBS: job killed: walltime 1254 exceeded limit 1200
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
Hadley4.5.exec 0000000000586FB8 coex2_ 217 coex1a.f
Hadley4.5.exec 000000000058607E coex_ 105 coex1a.f
Hadley4.5.exec 0000000000519D06 pp_file_ 162 ppfile1a.f
Hadley4.5.exec 00000000004D57CD stwork_ 3637 stwork1a.f
Hadley4.5.exec 00000000004C528F stash_ 3947 stash1.f
Hadley4.5.exec 00000000004CBFF3 st_diag2_ 3529 st_dia21.f
Hadley4.5.exec 0000000000434529 atm_step_ 4760 atmstep1.f
Hadley4.5.exec 00000000004212EC u_model_ 4084 u_model1.f
Hadley4.5.exec 000000000040978A Unknown Unknown Unknown
Hadley4.5.exec 00000000004069C2 Unknown Unknown Unknown
libc.so.6 0000003F7961D994 Unknown Unknown Unknown
Hadley4.5.exec 00000000004068E9 Unknown Unknown Unknown

After searching through the tickets here, I found it suggested to turn off the STASH packing. This increased the pf output by one timestep but the grid cells that were previously ice and now are shrub become the value of infinity in that extra timestep, in the timestep prior to this they look absolutely fine.

The error in the .leave file for this configuration with the packing set to 0 is:

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
Hadley4.5.exec 000000000088B99E gw_satn_ 261 gwsatn3a.f
Hadley4.5.exec 0000000000752AD7 gw_vert_ 506 gwvert3a.f
Hadley4.5.exec 000000000074DEA7 g_wave_ 434 gwave3a.f
Hadley4.5.exec 0000000000621B1E gwav_intctl_ 286 gwvict3a.f
Hadley4.5.exec 00000000005A3533 gwav_ctl_ 3705 gwav_ct1.f
Hadley4.5.exec 000000000055BF63 atm_phys_ 6511 atmphy1.f
Hadley4.5.exec 0000000000439B17 atm_step_ 3833 atmstep1.f
Hadley4.5.exec 00000000004212EC u_model_ 4084 u_model1.f
Hadley4.5.exec 000000000040978A Unknown Unknown Unknown
Hadley4.5.exec 00000000004069C2 Unknown Unknown Unknown
libc.so.6 000000386641D994 Unknown Unknown Unknown
Hadley4.5.exec 00000000004068E9 Unknown Unknown Unknown

Do you have any suggestions as to what could be wrong? Or what I could try next?

Thanks,
Catherine

Change History (8)

comment:1 Changed 7 years ago by simon

  • Owner changed from um_support to simon
  • Status changed from new to assigned

Hi Catherine,

A number of points/questions?

1) Do you really mean 1 day, rather than a month? HadCM3L normally has 48 atmos t/s per day.

2) What you are seeing is indicative of the model filling up with Nans (not a number) due the
infinity.

3) How did you change the ice points? Are you changing only the land ice and not the sea ice?

4) In general, it's not a good idea to change the coupling frequency or timestep length as the model is tuned to use these values and could easily become unstable elsewhere.

5) What's the runid of the base job you're using (ie the one without the timestep/coupling/whatever changes in it)?

Simon.

comment:2 Changed 6 years ago by ggcdcb

Hi Simon,

Hi Catherine,

A number of points/questions?

1) Do you really mean 1 day, rather than a month? HadCM3L normally has 48
atmos t/s per day.

Yes, yes I mean 1 day not one month.

2) What you are seeing is indicative of the model filling up with Nans
(not a number) due the
infinity.

3) How did you change the ice points? Are you changing only the land ice
and not the sea ice?

I changed the qrfrac.type.PMIP ancillary file and then set the vegetation parameterizations to be configured from that ancillary file. Yes, I am changing only land ice not sea ice.

4) In general, it's not a good idea to change the coupling frequency or
timestep length as the model is tuned to use these values and could easily
become unstable elsewhere.

OK, these were really just tests to see if the timestep of crashing was robust.

5) What's the runid of the base job you're using (ie the one without the
timestep/coupling/whatever changes in it)?

tdlbf

Thanks,
Catherine

comment:3 Changed 6 years ago by simon

Hi,

Can you tell me exactly how you changed the ancillary file.

Thanks,

Simon

comment:4 Changed 6 years ago by simon

Hi again,

Also, can you tell me where this is being run?

Simon.

comment:5 Changed 6 years ago by ggcdcb

Hi Simon,

I am running in Bristol. I changed the ancillary file using some IDL code which reads in the PFTs from a restart file and then changes the land ice grid cells to shrub. This is a netcdf file. Other vegetation parameters are just set to the values that I have been advised. I then used the standard Bristol code to make the ancillary files from the netcdf files. As far as I can tell, the ancillary file looks OK?

Thanks,
Catherine

comment:6 Changed 6 years ago by simon

Hi,

Unfortunately I don't have access to the Bristol machine and know nothing of the Bristol ancillary code. Do you know if anyone else has been able to do this? Have you tried asking Gethin Williams as he is the main support person for the Bristol 4.5 installation?

Simon.

comment:7 Changed 6 years ago by ggcdcb

OK, thanks.

Catherine

comment:8 Changed 6 years ago by ros

  • Resolution set to fixed
  • Status changed from assigned to closed
Note: See TracTickets for help on using tickets.