Opened 9 years ago

Closed 9 years ago

#620 closed help (fixed)

model crash - DERV_LAND_FIELD error

Reported by: abozzo Owned by: ros
Component: UM Model Keywords:
Cc: Platform:
UM Version: 6.6.3

Description (last modified by ros)

Hi,

I'm running a millennium run with HadGEM2-ES, job xfkmj. In the past weeks the model ran fine for more than 40years.
Yesterday it crashed with an error I haven't seen before:
in the xfkmj005.xfkmj.d11131.t122219.archive file the message is a generic I/O error

Routine generating error: U_MODEL
 Error code:  1
 Error message: 
TRANSIN: I/O read error

and in the .pe* files the error is:

 global_land_field set to  10519
 Error in FILE_OPEN called from DERV_LAND_FIELD.
 Trying to open atmos dump.
 Error returned from DERV_LAND_FIELD.
 Error code  1

Now, it seems the error is somehow related to the land-sea mask or to a corrupted dump file, but the model has run with no problems so far.
If I try to restart the run as CRUN from the last dump, it crashes immediately with the same error.
I'm not sure what went wrong…any clue?

Many thanks,

Alessio

Change History (7)

comment:1 Changed 9 years ago by ros

Hi Alessio,

I've not figured out why the run fell over initially, however….

The reason your run fell over when you tried to restart it from the last dump is because the dump you've specified (/projects/lastmil/abozzo/um/xfkmj/xfkmja.daj3210) doesn't exist. Can you correct that problem, try again and see what happens?

Regards,
Ros.

comment:2 Changed 9 years ago by abozzo

Hi Ros,

thanks for helping. I thought the last damp was xfkmja.daj3310. Now I see in the .leave file that the model looks for xfkmja.daj3210. I think The dump has been deleted as superseded dump (and I'm only archiving the Dec dumps).
Silly question: when I'm restarting the model as CRUN (I just resubmit the job), how do I modify (if it's possible) which dump is the model picking up when I restart the run?

Alessio

comment:3 Changed 9 years ago by ros

  • Description modified (diff)
  • Owner changed from um_support to ros
  • Status changed from new to assigned

Hi Alessio,

The last atmos dump was indeed xfkmja.daj3310, but the run must have crashed before it could produce the corresponding ocean dump, so your CRUN then tried to start from the beginning of the previous month (.daj3210), but there's only an ocean dump - I assume, as you say, it's deleted the superseded dump.

Can you check if you have run out of quota on MONSooN? I wonder if this is the reason your job originally crashed.

To get going again you will need to do an NRUN, specifying the ocean and atmos dumps you want to start from. You can start from the .daj32l0 dumps, although this is part way through the month which might mess up the climate means.

Regards,
Ros.

comment:4 Changed 9 years ago by abozzo

On MONSooN I'm using 44% of my quota, so it shouldn't be a problem.
Probably at this point the best thing to do is to restart as NRUN from the last December dumps (xfkmja/o.daj2c10) and go on from there and see if it goes over the crash…

Regards,

Alessio

comment:5 Changed 9 years ago by ros

Yes, that would be the better place to start from. If it does misbehave at the same point again, let us know and we'll investigate further.

Regards,
Ros.

comment:6 Changed 9 years ago by abozzo

Hi Ros,

I restarted the job and it's now running fine. It went over the previous crash point apparently without any issue.

You can close this ticket I think, although I've still no clue why it crashed..

Thank you,

Alessio

comment:7 Changed 9 years ago by ros

  • Resolution set to fixed
  • Status changed from assigned to closed
Note: See TracTickets for help on using tickets.