Opened 11 years ago

Closed 11 years ago

#281 closed help (fixed)

unexpected end of the run

Reported by: salvatore Owned by: jeff
Component: UM Model Keywords:
Cc: Platform:
UM Version: 4.5

Description

Hi,
I have a problem with my job xdzuz: this is a kind of job I have run many times in the past for 100 years or more; but now it stops running before 50 years and when I try to re-submit it (without processing it again since in the meantime I have turned it in a CRUN,STEP=4) it crushes immediately. Why that? Can it be related to the fact that the shell has been changed?

Thank you
Salvatore Pascale

Attachments (2)

ENTR.f (41.6 KB) - added by salvatore 11 years ago.
noadv.f (2.0 KB) - added by salvatore 11 years ago.

Download all attachments as: .zip

Change History (6)

comment:1 Changed 11 years ago by jeff

  • Owner changed from um_support to jeff
  • Status changed from new to accepted

Hi Salvatore

I think the reason it crashes before 50 years is to do with this ticket #277, the temporary fix for this is to only run the job in 6 hour chunks instead of 12 hours. When you resubmit your job it seems to blow up, this may be related to restarting in an unclean manner but it might just be a problem with your run.

Jeff.

comment:2 Changed 11 years ago by salvatore

Hi Jeff,
it seems that the problem is linked to some diagnostic I've included. In fact in the job xdzue I have excluded this diagnostics and it has run fine even though without changing the resubmission chunks; on the other hand in job xdzuz I have included only this diagnostics and the same problem has returned even though I have changed the run_resubmit_end from 50 to 20 years.

This diagnostics has the only characteristic to add 19 new tracers which are initialized to zero and go through different subroutines, but it doesn't touch the UM code and it has been developed by B.Plant and P.Clark and it is quite well tested. Until the shell and the compilers on hector were not changed, I have never had problems. So I exclude that there is some problem with the diagnostic code.

I attach the modsets

Changed 11 years ago by salvatore

Changed 11 years ago by salvatore

comment:3 Changed 11 years ago by jeff

Hi Salvatore

I ran your xdzuz job and it crashed at exactly the same point your job did, i.e. at timestep 364338 which is 42 years, 60 days and 18 hours into the run. The job crashed because of negative theta i.e. it blew up, see the end of this file

/work/n02/n02/swr07sp/xdzuz/xdzuz.fort6.pe1

Your problem is nothing to do with the change of shell and you don't seem to have the same problem as ticket #277. All your runs crash at the same timestep (364338) for the same reason (-ve theta).

I've compared the dumps of jobs xdzuz and zdzue after 1 year and the prognostic fields are different, therefore its perfectly possible for one run to crash and the other not. Some UM runs are unstable and crash, this is fairly common, you need to get some advice about what to do to get around the problem.

Jeff.

comment:4 Changed 11 years ago by jeff

  • Resolution set to fixed
  • Status changed from accepted to closed
Note: See TracTickets for help on using tickets.