Opened 4 years ago

Closed 4 years ago

#1330 closed help (answered)

UNRECOVERABLE library error

Reported by: oma Owned by: willie
Priority: high Component: UM Model
Keywords: Cc:
Platform: ARCHER UM Version: 7.3

Description

Hello,

I am trying to run a MetUM global run without success.
I have tried several fixes but nothing seems to work.
However, I just tried running an NAE run, which I'm sure ran not long ago, and it failed with the same error, so I am assuming there is a problem with ARCHER. Could you confirm that this is so, so that I just wait for that to be fixed instead of keep looking for errors in my setup?
The error that I keep receiving is the following:

lib-4001 : UNRECOVERABLE library error 
  A READ operation tried to read past the end-of-file.

Encountered during a sequential formatted READ from unit 10
Fortran unit 10 is connected to a sequential formatted text file: "fort.10"
 Current format: (A1)
                ^
_pmiu_daemon(SIGCHLD): [NID 02493] [c4-1c2s15n1] [Thu Jul 24 14:19:45 2014] PE RANK 0 exit signal Aborted
[NID 02493] 2014-07-24 14:19:45 Apid 9423903: initiated application termination
/work/n02/n02/oma/xjzma/bin/qsexecute: Error in dump reconfiguration - see OUTPUT

Thanks,

Oscar

Change History (5)

comment:1 Changed 4 years ago by willie

Hi Oscar,

Could you let us have read permissions

  chmod -R g+rX /home/n02/n02/oma
  chmod -R g+rX /work/n02/n02/oma

Regards,

Willie

comment:2 Changed 4 years ago by oma

Hi Willie,

Done! The global job I've tried to run is xkeza. The NAE run I know worked before on ARCHER is xjzma.

Thanks for looking into this,

Oscar

comment:3 Changed 4 years ago by oma

Hello,

I've tried to run yet another test case (job xkbwb). I am completely sure this case ran perfectly fine three weeks ago. This time it failed with exactly the same error as the other two:

lib-4001 : UNRECOVERABLE library error 
  A READ operation tried to read past the end-of-file.

Encountered during a sequential formatted READ from unit 10
Fortran unit 10 is connected to a sequential formatted text file: "fort.10"
 Current format: (A1)
                ^
_pmiu_daemon(SIGCHLD): [NID 01141] [c5-0c2s13n1] [Fri Jul 25 11:45:10 2014] PE RANK 0 exit signal Aborted
[NID 01141] 2014-07-25 11:45:10 Apid 9445912: initiated application termination
/work/n02/n02/hum/vn7.3/cce/scripts/qsexecute: Error in dump reconfiguration - see OUTPUT

Further down the *.leave files also show the following message

 HDPPXRF: Fortran Error Response =  26  Opening STASHmaster file STASHmaster_A

Could it be that I need to modify something in my .profile file or something like that? Has there been any change on file STASHmaster_A for vn7.3?

Any help would be greatly appreciated.

Regards,

Oscar

comment:4 Changed 4 years ago by willie

  • Owner changed from um_support to willie
  • Status changed from new to accepted

Hi Oscar,

I've had a look at xjzma. This fails because the UM7.8 start dump has fields not thought of at the time of the UM7.3 code. It tries to find these in the STASH master, runs off the end of the file and crashes. Similarly xkeza uses a UM7.6 start dump.

The solution is one of

  • recreate the start dump at 7.3,
  • use a later version of the model than the start dump,
  • attempt to remove all the newer fields

Regards,

Willie

comment:5 Changed 4 years ago by grenville

  • Resolution set to answered
  • Status changed from accepted to closed

Since there has been no activity on this ticket we have closed it.

Note: See TracTickets for help on using tickets.