Opened 7 years ago

Closed 7 years ago

#1089 closed help (fixed)

Jobs stopping on Polaris

Reported by: grenville Owned by: um_support
Component: UM Model Keywords:
Cc: Platform: Other
UM Version: 6.6.3


Thank you for setting up HadGEM2 to run on Polaris - I have been trying to run the model based on your example job (and also based on one of my Hector jobs) and it was very easy to get the model running.
Unfortunately, I seem to run into some problems after several hundred timesteps - where the model crashes and I have no idea why!
The model copied from your example job looks like it runs for 526 timesteps before crashing, while the run that is based on my Hector job (and is atmopshere only) runs for 1634 timesteps before crashing.

Do you have any idea why this may be happening. There are some cryptic messages in the output files that I didn't understand - but wondered if the crashes could be due to a memory overload/some temporary diskspace filling up.
Also have you managed to do any long HadGEM runs on Polaris?

My jobs are ximua (based on xidex) and ximub (based on Hector job xhjmj - which has succesfully run on Hector for 8 years).
If you have some time to have a look at these it would be really helpful.
Thanks and best wishes,

Change History (1)

comment:1 Changed 7 years ago by grenville

  • Resolution set to fixed
  • Status changed from new to closed

Issue solved - just required an increase in run time. However, another issue did arise as a result of looking into this. Climate meaning fails if stash item 262 (section 0) is included as one of the meaned fields. Climate meaning worked OK if this stash item was excluded. Note, lowering the optimization on acumps did not have an effect.

Note: See TracTickets for help on using tickets.