Opened 13 days ago

Last modified 11 days ago

#2989 new help

Atmospheric ancillaries crashing reconfiguration

Reported by: Leighton_Regayre Owned by: um_support
Component: UM Model Keywords: UKESM, AMIP
Cc: Platform: ARCHER
UM Version: 11.1

Description

Hello,

I'm attempting to run suite u-bk417 which is nudged to ECMWF wind fields and uses offline oxidants. The suite fails on the recon task because an atmospheric ancillary file read in with the rcf_ancil_atmos_mod.F90 subroutine cannot be found.

My suite log
/work/n02/n02/lre/cylc-run/u-bk417/log/job/20080101T0000Z/recon/01/job.out
indicates it's the file
/work/n02/n02/lre/ancils/ACURE/ukca_oxid_clim_1979_2014_az513_O3_rad.anc
that "does not exist". However, the file is clearly on ARCHER at this location.

This ancillary was entered into the "configure ancils and dump fields" table using stash_req 60 for ozone. Out of caution, I deleted an unused item from the table that also requested stash 60. I ran the macros in the STASH requests macro to make sure my new item had an item number (36355).

I can't see what I've done to make the ancillary appear missing.

Attachments (1)

configure_ancils_and_initialise_dump_fields_screenshot (68.0 KB) - added by Leighton_Regayre 13 days ago.
Screenshot of suite panel

Download all attachments as: .zip

Change History (28)

comment:1 Changed 13 days ago by willie

  • Keywords UKESM, AMIP added; ancillary does not exist stash request macro removed

Hi Leighton,

Your ancillary file /work/n02/n02/lre/ancils/ACURE/ukca_oxid_clim_1979_2014_az513_O3_rad.anc only has data for one year, 1979, but your start dump is dated 2008.

Willie

comment:2 Changed 13 days ago by Leighton_Regayre

Hi Willie,

These ancillaries are supposed to be periodic as they're for use in a set of AMIP simulations.

I would normally use ncdump to view such files, but it's not available on ARCHER. What software do you recommend for viewing netcdf files on ARCHER?

Does it look like the file in question is periodic?

Thanks,

Leighton.

comment:3 Changed 13 days ago by Leighton_Regayre

Hi again,

Just to clarify, I now understand that .anc files can't be opened with ncdump, although they can be viewed with xconv. I'm really not sure how they differ otherwise.

Thanks,

Leighton.

comment:4 Changed 13 days ago by willie

Hi Leighton,

Yes, you're right it is periodic. But there is no entry for its STASH (60) in the Configure ancils and initialise dump fields table. You'll need to add it back in again and re-index using the macros.

Ancillary files are just UM fields files (see UMDO F03).

Willie

Changed 13 days ago by Leighton_Regayre

Screenshot of suite panel

comment:5 Changed 13 days ago by Leighton_Regayre

Hi Willie,

Thanks for the info on .anc files.

I'm not sure what you are suggesting. I've added an attachment to this ticket showing a screenshot of my suite's "Configure ancils and initialise dump fields" panel. It looks to me like the item with stash_req 60 is set up correctly.

Could you add some more detail to your suggestion for me please?

Thanks,

Leighton.

comment:6 Changed 13 days ago by willie

Hi Leighton,

Sorry. Found it - I ordered by STASH request and the first entry is 132 so I didn't look any further. The period you have requested is 5 hours - I think you should set this to one year.

Willie

comment:7 Changed 13 days ago by Leighton_Regayre

Hi Willie,

Great! I'm testing your suggestion now. However, I'd like to make sure this choice gives me the science set up I'm aiming for. This particular ancillary is needed because I've switched off the chemistry and therefore the radiative feedback of O3. Ozone concentrations vary seasonally, so I think the information from this ancillary needs to be correct to the monthly mean at least.

The period defined in the item page in the "Configure ancils and initialise dump fields" sets the frequency of information update. You've suggested setting that period to 1 year. Does that imply that a full year of data will be read in on the first call, to be updated a year later (with the same information since this file is periodic)? Alternatively, does it imply that the model will run for a year with the O3 data from the first month/day (depending on the ancillary structure) before updating?

Thanks,

Leighton.

comment:8 Changed 13 days ago by Leighton_Regayre

Hi Willie,

The suggested change hasn't resolved the cause of the error.

Here's an extract from my job.out file:

Ancillary Files to be opened :
Ancil file num: 1
Ancil filename : '/work/n02/n02/lre/ancils/ACURE/ukca_oxid_clim_1979_2014_az513_O3_rad.anc'
Stash req = 60


Ancillary File does not exist.
File : '/work/n02/n02/lre/ancils/ACURE/ukca_oxid_clim_1979_2014_az513_O3_rad.anc'
Stashcodes : 60
Stashcodes : 60

so I'm still stumped.

Thanks,

Leighton

comment:9 Changed 13 days ago by willie

Hi Leighton,

Did you re-index using the macro after changing the frequency?

Willie

comment:10 Changed 13 days ago by Leighton_Regayre

Hi Willie,

Sorry, no. I hadn't thought that was necessary since the item was already defined. I've not done so and am re-running.

You might have overlooked my earlier question about how the change in frequency works in practice, since I replied again soon after. Could you give me an explanation when you have time please?

Thanks,

Leighton.

comment:11 Changed 13 days ago by simon

Hi,

You could try setting the source option for the ozone ancil in the rose gui to "missing data" rather than "initialise from ancillary file". This is because the ozone ancil is updated periodically as the model runs, and it isn't required by the recon. The recon will produce a dump with space for the ozone ancillary, but the model will process the actual data from the ancillary file at run-time. The coding for ozone updating is a bit of a mess and I think that's the reason for the less than helpful error message. Also set the updating frequency to 5 days.

comment:12 Changed 13 days ago by Leighton_Regayre

Hi Simon, Willie,

Thanks! I'm testing the "missing data" suggestion now.

What is the effect of the updating frequency? Does that apply to usage within the making of the reconfiguration, or does it apply to the actual run?

Cheers,

Leighton.

comment:13 Changed 12 days ago by Leighton_Regayre

Hello,

I tested this suggested change and the recon task failed at a different point with an unintelligible (to me) error. I mistakenly entered the details as a new track ticket (#2992), and Willie requested that we focus on one thing at a time, so I'm re-entering the details here:

When submitting suite u-bk417 (set up to be nudged towards ECMWF wind fields and using offline oxidant ancillaries) I get a confusing error when the recon task fails.

The branch I am using is a copy of the UM vn11.1 trunk with some minor code changes. I had problems with the offline oxidants in this suite (ticket #2989) but this problem looks to be distinct. It is possibly related to ticket #2236 but these is no explanation on that ticket page about how the problem was resolved.

The errors in the job.out are:

[INFO] command: um-atmos
[WARN] UM version (VN=x.y) defined in the environment.
[INFO] Overriding $VN to 11.1
[WARN] Using default STASHmaster as none provided "/work/y07/y07/umshared/vn11.1/ctldata/STASHmaster".
[WARN] Using default STASH2CF as none provided "/work/y07/y07/umshared/vn11.1/ctldata/STASH2CF/STASH_to_CF.txt".
[INFO] Using executable: /work/n02/n02/lre/cylc-run/u-bk417/share/fcm_make_um/build-atmos/bin/um-atmos.exe
[INFO] Using script: /work/n02/n02/lre/cylc-run/u-bk417/share/fcm_make_um/build-atmos/bin/um-atmos
[INFO] exec /opt/cray/alps/5.2.5-2.0502.9955.44.1.ari/bin/aprun -ss -n 192 -N 24 -S 12 -d 1 -j 1 /work/n02/n02/lre/cylc-run/u-bk417/share/fcm_make_um/build-atmos/bin/um-atmos.exe

=====================================================
GCOM Version 6.6
XC30_MPI
Using precision : 64bit INTEGERs and 64bit REALs
Built at Mon Jun 4 17:10:37 BST 2018
=====================================================

WARNING - REQUESTED AND ACTUAL THREADING LEVEL DIFFERENT
THREAD LEVEL REQUESTED is MPL_THREAD_MULTIPLE
THREAD LEVEL SET is MPL_THREAD_SERIALIZED
gc_abort (Processor 0): um_abort called

Followed by an instance of "has failed to pass any defensive checks" for different routines, with error code 80. There are many more of these in the log.err file.

Thanks,

Leighton.

comment:14 Changed 12 days ago by simon

The model aborted because the the error rather than the warning. It looks as if you are running with a Gregorian calendar but the override app file assumes that you have climate meaning turned on, which you don't. You need to either 1) turn on climate meaning or 2) edit /home/Leighton_Regayre/roses/u-bk417/app/um/opt/rose-app-gregorian.conf and comment out (!!) the "ppselectim=1,1,1,0" line.

Also change the update frequency of the ozone ancil to 1 day.

comment:15 Changed 12 days ago by Leighton_Regayre

Hi Simon,

Thank-you. Grenville suggested I switch off climate meaning to solve a problem with postprocessing (#2933), so I'll try option 2.

Could you please explain the effect of updating the frequency of ozone ancil? I'm asking (3rd time) to improve my knowledge and also to make sure the scientific set up of this suite is correct.

Thanks,

Leighton.

comment:16 Changed 12 days ago by Leighton_Regayre

Hi again,

I'm presuming you're suggesting I continue with the source of the O3 ancillary set to "missing data" as suggested on comment 11 here.

Thanks,

Leighton

comment:17 Changed 12 days ago by willie

For reference: other tickets relating to this suite u-bk417 are: #2981, #2982

comment:18 Changed 12 days ago by simon

The frequency sets how often the ancil in the model is time interpolated and updated from the fields in the ancillary file. For Gregorian runs the frequency is usually set to one day for monthly mean ancils. The information on the frequency is in the help in the GUI:

Overwrite the field periodically as the model progresses. The start of each
period is defined relative to a common ancillary updating reference time
specified by namelist:nlstcall=ancil_reftime. The field is updated at the
start of each period and at the beginning of the run. The fields are time-
interpolated to the centre point of the period in question. The start time
of the run may be part way through a period. In this case the fields are
time-interpolated to the centre point of the full period without reference
to the start time. It is possible to use the start time as the reference
time.

This functionality is not available if the field has been initialised from
NetCDF data in the reconfiguration step (namelist:items=source=10).

Yes, keep the missing data setting.

comment:19 Changed 12 days ago by Leighton_Regayre

Hi Simon, Willie,

Thanks for the info. I'd read the help description but didn't find it useful for the question your suggestions prompted. Setting to yearly didn't sound like a useful choice scientifically. I guess daily should be fine.

I'm afraid the suite has crashed again because of the O3 ancillary, this time in the atmos_main task.

The associated error is:
???!!!???!!!???!!!???!!!???!!! ERROR ???!!!???!!!???!!!???!!!???!!!
? Error code: 1
? Error from routine: io:file_open
? Error message: Failed to open file '/work/n02/n02/lre/ancils/ACURE/ukca_oxid_clim_1979_2014_az513_O3_rad.anc'
? Error from processor: 0
? Error number: 209
????????????????????????????????????????????????????????????????????????????????

Thanks,

Leighton.

comment:20 Changed 12 days ago by simon

This is a subtle one, I think. In the GUI, try removing the single quotes at the start and end of the full ancil filename. The model is assuming the quotes are part of the filename. I think as it's a full filename it is treated differently from the other ancils which have an environment variable in their path.

The advise about setting missing data still stands.

comment:21 Changed 12 days ago by Leighton_Regayre

Hi Simon, Willie,

That seems to have worked. If removing the parentheses was the problem, can I now remove the "missing data" setting for the ancillary source do you think?

The suite has triggered another error related to ancillaries, but I can't tell which ancillary it relates to:
UKCA AGE-OF-AIR: Reset method= 1. Tracer will be reset upto level 10
5 files found in offline namelist

????????????????????????????????????????????????????????????????????????????????
???!!!???!!!???!!!???!!!???!!! ERROR ???!!!???!!!???!!!???!!!???!!!
? Error code: 65536
? Error from routine: EM_GET_TIME_INFO
? Error message: NetCDF calendar 360_day does not match model calendar, and attribute calendar_flexible /= 1 or not set
? Error from processor: 0
? Error number: 209
????????????????????????????????????????????????????????????????????????????????

Thanks,

Leighton

comment:22 Changed 12 days ago by luke

Hi Leighton,

This is a UKCA emissions NetCDF file error and not anything related to UM ancillary files. As you're nudging you need to use either emissions explicitly defined using the Gregorian calendar, or use a 360_day file which has the attribute calendar_flexible = 1.

If you look in your job.out does it list the file explicitly? I believe that I copied all the required files over for you. You may need to update the suite or let me know what the problem file is.

Thanks,
Luke

comment:23 Changed 12 days ago by simon

And for the missing data setting, see my answer in comment:20.

comment:24 Changed 11 days ago by simon

I suspect the files in /work/n02/n02/lre/ancils/ACURE are the issue. They all have time:calendar = "360_day" and none have calendar_flexible set.

comment:25 Changed 11 days ago by Leighton_Regayre

Hi all,

Thanks for the advice.

Simon I've adjusted the netcdf files using advice on nco from Jane Mulcahy who originally made the ancillaries for me:

ncatted -a calendar_flexible,time,c,l,1 file.nc

I'm not testing this change, although it may be the .anc file that's causing the problems, in which case it will need to be remade.

Regarding the "missing data" source, it seems to me it would be preferable to have the actual data used in making the reconfiguration file if possible. Since it looks like the problem with the ancillary was in the subtlety of how it was named (comment 20) I'm wondering what the benefit is of retaining a change that didn't solve the problem (comment 11) but will probably be corrected by referring to the ancillary correctly.

Thanks again,

Leighton.

comment:26 Changed 11 days ago by simon

It is not the .anc file which is causing the problem. The error was due to the calendar in Netcdf format files and the .anc file is in ancillary format which is read-in in a completely different way.

As for the ozone ancillary, because it is best practice to set any time-varying ancillary to missing data in the reconfiguration. The model ignores any ancillary data in the start dump when processing time-varying ancils of the same type. Time-varying ancils are set on the first time-step and then updated as the model runs. Setting the ancillary data to missing data ensures that the model will fail if there is an issue reading in the ancillary file at run-time, and not instead use any time-invariant data which pre-exists in the start dump.

Have a look at sea-ice and the vegetation (such as "LEAF AREA INDEX OF PLANT FUNC TYPES") fields in your start dump. These area all time varying ancils, and have been set to missing data.

comment:27 Changed 11 days ago by Leighton_Regayre

Hi Simon,

Thanks very much for the clear explanation.

Leighton

Note: See TracTickets for help on using tickets.