Opened 9 years ago

Closed 9 years ago

#661 closed help (fixed)

reconfiguration from ecmwf grib dump fails

Reported by: eartmt Owned by: willie
Component: UM Model Keywords:
Cc: Platform:
UM Version: 6.6.3

Description

I have troubles starting from ecmwf dump. My job xgbh.e fails at reconfiguration. The leave file from my latest attempt is in:

~eartmt/um/umui_out/xgbhe000.xgbhe.d11209.t120033.leave

Here is the relevant error message, don't know what it means:

C I/O Error: failed in BUFFIN8
Return code = 1

Thanks in advance for any help,

Tomek

Change History (13)

comment:1 follow-up: Changed 9 years ago by jeff

Hi Tomek

Your file permissions on Hector don't allow anyone else to view your files. Running these commands on hector will allow people in the n02 group to read your files.

chmod -R g+rX /home/n02/n02/eartmt
chmod -R g+rX /work/n02/n02/eartmt

Jeff.

comment:2 in reply to: ↑ 1 Changed 9 years ago by eartmt

Sorry about that, should be readable now.

comment:3 Changed 9 years ago by willie

Hi Tomek,

I normally run a standard reconfiguration job on GRIB files before doing anything else with them. If you take a copy of the job xdkeb (user 'umui') and run this on your GRIB file you will get a new reconfigured start dump in UM format. You can then use this with xgbhe.

Regards,

Willie

comment:4 Changed 9 years ago by eartmt

Hi Willie,

I reconfigured the startdump with xdkeb (copied to xgbhb) as you suggested, but xgbhe is still failing at reconfiguration, this time with a lot of messages like:

REPLANCA: UPDATE REQUIRED FOR FIELD 1 : Land-Sea Mask

I also reconfigured startdump with another standard job (xdkea), but got exactly the same problem using it. The leave file from my last attempt today:

~eartmt/um/umui_out/xgbhe000.xgbhe.d11217.t163641.leave

Regards,

Tomek


comment:5 Changed 9 years ago by willie

  • Owner changed from um_support to willie
  • Status changed from new to assigned

Hi Tomek,

If you look for error earlier in the file, you will see,

[0] ERROR - MPID_nem_gni_check_localCQ(): GNI_CQ_EVENT_TYPE_POST had error (SOURCE_SSID_DREQ:MDD_INV)
Rank 0 [Fri Aug  5 15:40:13 2011] [c1-1c1s0n1] Fatal error in MPI_Testall: Other MPI error, error stack:
MPI_Testall(251)...............: MPI_Testall(count=95, req_array=0x7ffffff15700, flag=0x7ffffff14f74, status_array=0x7ffffff14f90) failed
MPIDI_CH3I_Progress(150).......: 
MPID_nem_mpich2_test_recv(790).: 
MPID_nem_gni_poll(1276)........: 
MPID_nem_gni_check_localCQ(560): unrecoverable network error

This is the real problem. Your job did not have this on 28/July at 12:30, but all subsequent runs do have this. Could you look in the edit history and let me know what changed?

Regards,

Willie

comment:6 Changed 9 years ago by eartmt

Hi Willie,

I'm away this week and can't access umui, but AFAIR the successful job was with the original startdump (as in xgbh.d) with no changes otherwise. You can also diff xgbh.e with xgbh.d (this one works, as far as can tell, and was the base for xgbh.e) to see what exactly differs.

Tomek

comment:7 Changed 9 years ago by willie

Hi Tomek,

If you look at the bottom of the output file, the reconfiguration has failed:

 ERROR!!! in reconfiguration in routine Rcf_Set_Data_Source
 Error Code:-  30
 Error Message:-  Section  0 Item  101  : Required field is not in input dump!

Naturally, this is because the start dump is based on a GRIB file which contains only the minimum start the model.

To get round this you need to create a user STASH entry for this item (=S02 mass mixing ratio). Then you need to make an entry in the STASH > Initialisation of User Prognostics table. Option 3 (set to zero) or 7 (initialise from ancillary) are the likely choices.

Regards,

Willie

comment:8 Changed 9 years ago by eartmt

Hi Willie,

Where do you see that error? I grep through all the *.leave files in my home and the only one that mentions Error Code 30 is for completely unrelated vn7.8 job xgbht and even there it is Item 9 rather than 101 that is missing, so still not the same error message.

The start dump /work/n02/n02/eartmt/xgbhb/xgbhb.astart for xgbhe job was produced with the standard vn6.1 reconfiguration job, as you suggested, and except for the BUFFIN8 thingy I don't see any error in the output of that reconfiguration job:

~eartmt/um/umui_out/xgbhb000.xgbhb.d11216.t165743.leave

Apologies, if I'm blind, but I just can't find that error.

Regards,

Tomek

comment:9 Changed 9 years ago by eartmt

I get the error about 'Section 0 Item 101…' when I use this xgbhb.astart start dump with the vn7.8 job xgbht, so I will try your suggestion for this one. However, the job xgbhe (vn6.6.3), for which is this ticket, does not seem to have the same issue.

Tomek

comment:10 Changed 9 years ago by willie

Hi Tomek,

Sorry I was quite clear. It is in the output directory /work/n02/n02/eartmt/xgbhe/xgbhe.fort6.rcfa.pe1, at the bottom of the file.

Regards,

Willie

comment:11 Changed 9 years ago by willie

Oops, That should read wasn't quite clear.
Willie

comment:12 Changed 9 years ago by eartmt

Hi Willie,

So I created a user stash file with missing item 101 and the job went a bit further, but only a little, failing with missing item 102. So I added that and it failed with item 103 and so on. After a while I fired up xconv and started to compare the content of the original startdump with the ECMWF-generated one and after adding 61 items in section 0 and 49 in section 33, all initialized to zero, I finally got past the missing data issue. I wonder, however, whether there is some better way to figure out which fields are missing?

Tomek

comment:13 Changed 9 years ago by willie

  • Resolution set to fixed
  • Status changed from assigned to closed
Note: See TracTickets for help on using tickets.