Opened 9 years ago

Closed 9 years ago

#677 closed help (fixed)

GA3.0 N96 job crashes on HECToR

Reported by: ros Owned by: um_support
Component: UM Model Keywords: STASH
Cc: r.k.schiemann@… Platform:
UM Version: 7.8

Description

Hi Folks,

Reinhard has a job (agjbe) which he's trying to get running on HECToR but is crashing at timestep 71 (See output /home/n02/n02/reinhard/umui_out/xgjbe000.xgjbe.d11234.t104644.leave).

With STASH turned off the 10day NRUN completes ok, so we know the problem is with STASH but not sure where, as there are no helpful error messages in the output saying which diagnostic is the problem. Has anyone any ideas on how to proceed to track down the problem diagnostic?

For reference this job is a copy of a UM7.7 job that runs successfully on MONSooN (we don't have 7.7 on HECToR hence the upgrade to 7.8). And I believe Reinhard has successfully run the vn7.8 copy on MONSooN too.

Hopefully that's all the vital info

Change History (5)

comment:1 Changed 9 years ago by grenville

Reinhard

Try switching on Report STASH messages in Input/Output? Control→Output Choices panel and run again

Grenville

comment:2 Changed 9 years ago by willie

Hi Reinhard,

If you do a check set up, it will complain about domain profile DICECAT. There is some wrong with the pseudo levels. You can edit it on the STASH page.

Regards,

Willie

comment:3 Changed 9 years ago by ros

I am still trying things Malcolm suggested (see below), so maybe the best is I do that first, and get back to you if it does not work.

Thank you for your help,

Reinhard

Begin forwarded message:

Reinhard,
I think this is irrelevant - that profile is not used for any STASH
diagnostics that are used.

If you'd like to try copying job
xfohd

and compile/run. It is simply a copy of your xgjbg job, but I've loaded
into it the STASH from Pier Luigi's xfqzl job, which we know worked OK.
If you want to try something in parallel, then from your job xgjbg, go
into the STASH panel, click on

Diagnostics/Set? package switches

at the top, and switch to N the Z option - CMIP5 diagnostics.

So I'm making the assumption that the problem is with one of the
diagnostics currently selected, but with not much to go on as to which
one.

Malcolm

comment:4 Changed 9 years ago by willie

Hi Reinhard,

I notice that you are using packing profile 5 - is this normally done? We've had problems with packing profiles in the past. The STASH system uses these when writing out.

Regards,

Willie

comment:5 Changed 9 years ago by ros

  • Resolution set to fixed
  • Status changed from new to closed

Ticket now being closed.

Switching off the following 2 diagnostics fixed the problem.

> Sec 1, item 235, TDAY30yr - switch off (double click in the Incl column)
> Sec 2, item 207, TDAYM  - switch off 
Note: See TracTickets for help on using tickets.