Opened 20 months ago

Closed 20 months ago

Last modified 19 months ago

#2374 closed help (fixed)

Nested model failing in the second day

Reported by: pliojop Owned by: um_support
Component: UM Model Keywords:
Cc: Platform: Monsoon2
UM Version: 10.6

Description

Afternoon,

I have a UM10.6 nested job on Monsoon, u-au087 (comitted for set up in this query). For my job I have built custom SST and Sea Ice ancil files for both the global and the nested model using xancil.

I set up the job normally, building all required ancils with the exception of SST and Sea ice. I then edit the file

rose-app.conf

in

~/roses/u-au087/app/um

adding the lines:

[namelist:items(5997b35c)]
ancilfilename='/projects/polar/jpope/OSTIA/1988_by_Year/Nested/ostia_sst_11_1990_04_1991_regrid_extrapol_88.anc'
domain=1
!!interval=1
!!period=1
source=2
stash_req=24
update_anc=.false.
!!user_prog_ancil_stash_req=
!!user_prog_rconst=0.0
[namelist:items(9500b563)]
ancilfilename='/projects/polar/jpope/OSTIA/1988_by_Year/Nested/ostia_seaice_11_1990_04_1991_regrid_extrapol_88.anc'
domain=1
!!interval=1
!!period=1
source=2
stash_req=31
update_anc=.false.
!!user_prog_ancil_stash_req=
!!user_prog_rconst=0.0

I then edit the file

rose-suite.conf

in

~/roses/u-au087/

So that

rg01_rs01_ancil_versions="/projects/polar/jpope/OSTIA/Year_by_Year/Nested/"

I also create symbolic links to my sst and sea ice ancils in

~/cylc-run/u-au087/share/data/ancils/IcelandGreenland/8p0

ie:

ln -s /projects/polar/jpope/OSTIA/Year_by_Year/Nested/ostia_seaice_11_1990_04_1991_regrid_extrapol.anc qrclim.seaice

I then run the simulation for the first cycle (1990-11-01) to build the executable. This works fine. Once it has completed, I edit the cycle run for the whole job length 1990-11-01 to 1991-04-30. I then submit the job. The first day once again runs fine, but on the second day (1990-11-02) the model fails with an error at the first forecast stage of the nested UM section (the glm section has run fine, including its custom global ancils). The error message reads:

? Error code: 2
? Error from routine: COEX (cmps_all)
? Error message: Unable to WGDOS pack to this accuracy
? Error from processor: 16
? Error number: 24

If I submit the job from day 1990-11-02, that day which had crashed in the first attempt, the job runs fine and then crashes on day 2 of this job, which is 1990-11-03

I know that the model is reading in the correct ancil files having checked all the astart files. That it will run the 1990-11-02 day fine when it is the first day, but not the second day, I assume the file has built properly and it is an error in the model set up?

Any thoughts would be much appreciated.

Thanks

James

Change History (11)

comment:1 Changed 20 months ago by grenville

James

Please try switching off packing of the output (um→namelist→model input and ..→model output streams→pp0 etc).

It appears to be trying to pack data which is not packable — you might see which fields are causing problems.

Grenville

comment:2 Changed 20 months ago by pliojop

Morning Grenville,

I have now completed testing turning off each of the packing options for pp0 etc. However, this has not resolved the error, the model failing in the same spot with the same error.

One thing that occured to me is that I copied the stash for my nested simulation from a suite that is at version 10.4 and I operate at 10.6. Could this be the cause of my issue?

James

comment:3 Changed 20 months ago by grenville

James

Is this still suite u-au-087?

Grenville

comment:4 Changed 20 months ago by grenville

James

Its failing immediately - the only fields written a time zero are the land mask and the orography (each twice?) - try switching off these fields (they must be available as input data in any case)

Grenville

comment:5 Changed 20 months ago by pliojop

Afternoon Grenville,

As Monsoon was queuing quite a bit last week, I tested the pp packing across four similar suites. I have also been working through some other ideas regarding what I am thinking now must be the issues, namely the ancil files for sea ice and SST in the nested region.

I have been today working with u-au089 and found that when I removed the sea ice and SST files from being included in:

um → namelist → Configure ancils and initialise dump fields

That the job ran two full days to completion without any issues.

However, when I returned to include the sea ice and SST fields within this job again, they fell over at the second day, first forecast step of the UM.

I built these ancillary files from globhal OSTIA data a 7200 x 3600 and regridding decisions where based on the advice of Jeff Cole (ticket: http://cms.ncas.ac.uk/ticket/2346) ensuring that there were no NaN values in the ancils files due to the building process and checking the ancil files from day 1 of the u-au089 job there are no NaN values appearing in the files as far as I can identify. The same process was used to build the global ancil files which run fine.

One thing I was considering was that I defined my ancil files by size of the domain itself. I copied the land sea mask from my job and its settings in xconv were used to re-grid my SST and sea ice fields for inclusion within the model runs, so a 200 x 210 region at a resolution of 0.072021 x 0.07200. Should I have included a wider region for my input ancil file?

The sea ice and sst ancil files for the current settings of u-au089 (which I have committed for how it just ran (and failed) as revision number 67752), can be found at:

/projects/polar/jpope/OSTIA/2004_by_Year/Nested

James

P.S My run of u-au087 just now ran through the first GLM and UM timesteps for day 1 without crashing.

comment:6 Changed 20 months ago by jeff

Hi James

Looking at your ancillary files in /projects/polar/jpope/OSTIA/2004_by_Year/Nested, I think you may be using the wrong land/sea mask to create them. Try using the fractional land mask and use and points with values < 1 as sea points.

Jeff.

comment:7 Changed 20 months ago by pliojop

Thanks Jeff, I'll have a look at that. However, the current set up of u-au087 does not use a land sea mask at any point, the model is given the inputs from ancil files were I have extrapolated over missing data and re-gridded the data.

/projects/polar/jpope/OSTIA/Year_by_Year/Nested/ostia_seaice_11_1990_04_1991_regrid_extrapol_nomask.anc

and

/projects/polar/jpope/OSTIA/Year_by_Year/Nested/ostia_sst_11_1990_04_1991_regrid_extrapol_nomask.anc

and that set up also failed in the same place with the same error as u-au089.

James

comment:8 Changed 20 months ago by jeff

If you are using the nomask files then that will rule out masking as being the problem.

Jeff.

comment:9 Changed 20 months ago by pliojop

  • Resolution set to fixed
  • Status changed from new to closed

Morning,

Working with Stu Webster we determined that the two issues in this situation where:

1) The domain contained a couple of grid squares representing Svalbard in the north east corner, Stu believed this had caused issues in the past.

2) I modified the SST and sea ice files in the nested domain to update daily in

um → namelist → Configure ancils and initialise dump fields

These changes appear to have resolved the issues. I'll therefore close the ticket. Thanks as ever for your assistance.

James

comment:10 Changed 19 months ago by pliojop

Just a quick update on this.

I did have a few further errors when trying to run this job beyond 7-10 days. However, when I went to:

um → namelist → Ancil Options

and set

l_amipii_ice_processing = True

The model simulations ran without issue. I presume that someone with similar issues may need one or the other of these solutions, so I wanted to add this to this ticket.

Thanks again,

James

comment:11 Changed 19 months ago by grenville

Thanks for the update

Note: See TracTickets for help on using tickets.