Opened 7 months ago

Closed 7 months ago

#3013 closed help (fixed)

Nested suite failing in reconfiguration

Reported by: anmcr Owned by: um_support
Component: UM Model Keywords:
Cc: Platform: Monsoon2
UM Version:

Description

Hello,

The job id is u-bk861. Its a nested suite run, over Antarctica. It includes some modifications by Stu Webster so that the LAM ancillaries are produced from the global model, as the ancil creation software failed otherwise. I have just run the model for an entire year (2013), but now when I try to run it for 2005 it fails in the reconfiguration. I think the error is:

???????????????????????????????????????????????????????????????????????????????
?????????????????????????????? WARNING ??????????????????????????????
? Warning code: -10
? Warning from routine: ANCIL_CHECK_GRID_STAGGER
? Warning message: Ancil file mismatch in fixed header(9) grid stagger value
? Model grid stagger = 6
? Ancil file grid stagger = 2
? Ancil file path = /projects/polar/amworr/startdumps/AntarcticCORDEX/sic_sst/glm/20041231_12-20051231_12_sic_um_grid_glm_n768
? PLEASE READ - this warning will be converted to an error
? in future. Please update ancil file to specify the correct
? grid stagger value.
? Warning from processor: 0
? Warning number: 2
????????????????????????????????????????????????????????????????????????????????

[3] exceptions: An exception was raised:11 (Segmentation fault)
[2] exceptions: An exception was raised:11 (Segmentation fault)

However, I produced this ancillary file using xancil in the exact same way as I produced the ones for the 2013 run, which ran fine.

Are you able to please advise?

Thanks,

Andrew

Attachments (1)

for_grenville.PNG (37.7 KB) - added by anmcr 7 months ago.
Screenshot for Grenville

Download all attachments as: .zip

Change History (20)

comment:1 Changed 7 months ago by grenville

Andrew

Could you switch on extra diagnostic messages and run a the failed task again?

Grenville

comment:2 Changed 7 months ago by anmcr

hi Grenville,

Thanks for getting back to me.

I switched on extra diagnostic messages, and re-ran the model. The output is at: /home/d01/amworr/cylc-run/u-bk861/log/job/20050101T0000Z/glm_um_recon2. However, it doesn't seem that different from the original output.

I did a test, rerunning this job for a different time and hence using different ancillaries (but also again made by my using xancil). This run worked, which led me to think that the error must be with the ancillary files for sst and seaice (at n768) that I made. So I re-made the ancillaries for the failed (2005) run, but the same reconfiguration error occurred.

Best wishes,

Andrew

comment:3 Changed 7 months ago by grenville

Hi Andrew

Sorry, I meant RCF_PRINTSTATUS (you set PRINT_STATUS, which is for the model).

Grenville

comment:4 Changed 7 months ago by anmcr

Hi Grenville,

I switched it on and re-ran the model. The output is at: /home/d01/amworr/cylc-run/u-bk861/log/job/20050101T0000Z/glm_um_recon2/01/job.out.

It seems to point to problems with the ancillary file that I made, located at: /projects/polar/amworr/startdumps/AntarcticCORDEX/sic_sst/glm/20041231_12-20051231_12_sic_um_grid_glm_n768. Is that correct?

As I said, I tried remaking this file, however the error still persisted. But I will look into this further if you agree.

Thanks,

Andrew

comment:5 Changed 7 months ago by grenville

Hi Andrew

I can't tell quite where the problem is - you say it ran OK for 2013 - where is the ancillary file for that year. I'm no expert on ice files, but the ICE EDGE field in 20041231_12-20051231_12_sic_um_grid_glm_n768 looks strange.

Grenville

comment:6 Changed 7 months ago by grenville

Hi Andrew

Scrap that - you aren't using the ice edge. Could you switch off configuration of this ancil file - if that works we'll be confident of the source of the problem?

Grenville

comment:7 Changed 7 months ago by anmcr

Hi Grenville,

I'm not sure that I can switch off this ancil file. It was hard wired into the code by Stu Webster. The routine that calls these ancillaries is located at u-bk861/app/glm_um/opt/rose_spp_n768.conf. I tried commenting the blocks of code for SST and seaice in this routine, which caused the routine to fail. However, I'm pretty certain that this is the issue (e.g. when I re-ran for a different year, the reconfiguration successfully completed).

Best wishes,

Andrew

comment:8 Changed 7 months ago by grenville

Andrew

Go to glm_um→namelist→Recon…→Configure..-

right click on 319c889a and select ignore section

Grenville

comment:9 Changed 7 months ago by anmcr

Dear Grenville,

These sections are working. They read in files labelled 20041201_00-20051231_12_sic_um_grid_glm and are at N320 resolution. This part of the reconfiguration is labelled 'glm_um_recon1' in the attached screenshot. It is 'glm_um_recon2' that is failing, that reads in fields at N768 resolution, that are then interpolated to the 0.11 LAM grid in order to force the LAM (this was Stu's change). These read in files labelled 20041231_12-20051231_12_sst_um_grid_glm_n768.

Best wishes,

Andrew

Changed 7 months ago by anmcr

Screenshot for Grenville

comment:10 Changed 7 months ago by anmcr

Hi Grenville,

I am running the same model for the year 2010 successfully. I have made a copy of this model, and am running it for 2005 (the year that is failing here) to see whether the issue is with the model somehow or the actual ancillary files I made.

Andrew

comment:11 Changed 7 months ago by grenville

Andrew

I seems to be failing in chk_look.F90 while checking the header information - I'll wait to see of your test works before doing anything more.

Grenville

comment:12 Changed 7 months ago by anmcr

Hi Grenville,

The test failed, so I strongly suspect that there is something wrong with the ancillary file that I have produced. When I compare the output from glm_um_recon2 from the failing run with one that worked, then the failed run produces the output below. There is a warning of an invalid entry in the header file of the ancillary.

Best wishes,

Andrew

FIXED LENGTH HEADER


Dump format version-32768
UM Version No 405
Atmospheric data
On hybrid levels
Over global domain
Ancillary dataset
Exp No = 1 Run Id = 1
Gregorian calendar
Arakawa B grid
Record: Year Month Day Hour Min Sec DayNo?
Data time = 2004 12 31 12 0 0 366
Validity time = 2005 12 31 12 0 0 365
Creation time = 0 0 0 12 0 0 0
Start 1st dim 2nd dim 1st parm 2nd parm
Integer Consts 257 15 15
Real Consts 272 6 6
Level Dep Consts -32768 1 1 1 1
Row Dep Consts -32768 1 1 1 1
Column Dep Consts -32768 1 1 1 1
Fields of Consts -32768 1 1 1 1
Extra Consts -32768 1 1
History Block -32768 1 1
CFI No 1 -32768 1 1
CFI No 2 -32768 1 1
CFI No 3 -32768 1 1
Lookup Tables 278 64 2193 64 2193
Model Data 140630 -414515200 -414515200
WARNING: Invalid entry for fixed length header(11) in input dump
Setting co-ordinate system of input grid to regular longitude/latitude

LOOKUP TABLE
140352 64-bit words long
============================= PBS epilogue =============================

comment:13 Changed 7 months ago by grenville

Andrew

I'd failed to notice previously, but your ancil file is created for UM 4.x - the latitudes are in the wrong order for a later UM version (see /common/um1/ancil/atmos/n512e/sst/hadisst_6190/v4/qrclim.ssh for example) - please check the one that worked (which file worked?).

Grenville

comment:14 Changed 7 months ago by anmcr

Hi Grenville,

Thanks for noticing this. Yes, your correct. The runs that worked had vn11.1 and not vn4.5. I re-made the ancils with vn11.1, which moved it along a little but it is still failing. Now the error is 'Non-standard period for periodic data' (see below). This data is daily fields. I saw tickets #2810 and #2577 are related to this, and will look into it further.

Andrew

???!!!???!!!???!!!???!!!???!!! ERROR ???!!!???!!!???!!!???!!!???!!!
Rank 1 [Mon Sep 23 13:11:08 2019] [c0-0c1s13n0] application called MPI_Abort(MPI_COMM_WORLD, 9) - process 1
? Error code: 631
Rank 3 [Mon Sep 23 13:11:08 2019] [c0-0c1s13n0] application called MPI_Abort(MPI_COMM_WORLD, 9) - process 3
? Error from routine: RCF_ANCIL_ATMOSRank 2 [Mon Sep 23 13:11:08 2019] [c0-0c1s13n0] application called MPI_Abort(MPI_COMM_WORLD, 9) - process 2

? Error message: replanca_rcf_replanca: Non-standard period for periodic data
? Error from processor: 0
? Error number: 6
????????????????????????????????????????????????????????????????????????????????

comment:15 Changed 7 months ago by anmcr

Hi Grenville,

Below is a clearer error message from the output file. The complaint is with sea ice fraction. I really don't know what I am doing wrong with xancil to generate this.

Best wishes,

Andrew

c_io ( 13):Open: File=/projects/polar/amworr/startdumps/AntarcticCORDEX/sic_sst/glm/20031231_12-20041231_12_sic_um_grid_glm_n768
c_io ( 13):Open: File exists (31129679528 bytes)
IO: Open: /projects/polar/amworr/startdumps/AntarcticCORDEX/sic_sst/glm/20031231_12-20041231_12_sic_um_grid_glm_n768 on unit 13
replanca_rcf_replanca: UPDATE REQUIRED FOR FIELD 31 : FRAC OF SEA ICE IN SEA AFTER TSTEP
Error in replanca_rcf_replanca
CMESSAGE replanca_rcf_replanca: Non-standard period for periodic data
ErrorStatus? 631

????????????????????????????????????????????????????????????????????????????????
???!!!???!!!???!!!???!!!???!!! ERROR ???!!!???!!!???!!!???!!!???!!!
? Error code: 631
? Error from routine: RCF_ANCIL_ATMOS
? Error message: replanca_rcf_replanca: Non-standard period for periodic data
? Error from processor: 0
? Error number: 6
????????????????????????????????????????????????????????????????????????????????

comment:16 Changed 7 months ago by anmcr

Hi Grenville,

The xancil file I am using is on ARCHER, at /home/n02/n02/anmcr/work/start_files/Antarctic_Cordex/sic_sst/glm_n768. Are you able to please take a look? I'm not managing to make much headway myself.

Best wishes,

Andrew

comment:17 Changed 7 months ago by jeff

Hi Andrew

I tried looking at your ancillary file but it changed as I was looking at it, so I've no idea what you have done.

How many times do you want in your file? Do you actually need a periodic ancillary file? There are restrictions about what is supported for periodic ancil files with a Gregorian calendar and what you are trying to do won't work. See ticket #2533 for a similar problem.

Jeff.

comment:18 Changed 7 months ago by anmcr

Dear Jeff,

Thanks for looking at this. I have the reconfiguration running now. The problem was that I had the 'periodic ancillary file' option switched on, which was wrong.

Please close this ticket.

Best wishes,

Andrew

comment:19 Changed 7 months ago by jeff

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.