Opened 9 years ago

Closed 8 years ago

#674 closed help (fixed)

Runs crashing after 160 Years [Error: DRLANDF1 : Error in FILE_OPEN]

Reported by: eead Owned by: um_support
Component: UM Model Keywords:
Cc: eeamd@… Platform:
UM Version: 4.5

Description (last modified by ros)

Hi there,

I am wondering if you can help. I have 3 jobs running at the moment that all crashed within an hour of each other a few weeks ago - but for different reasons (tdanf, tdang, tdanh). Two of them had done over 160 years and one had just done around 30. I have now tried restarting them, but they are all failing with the same error message:

 Model completed with the following :
     Error Code :  1
     Message    :  DRLANDF1 : Error in FILE_OPEN.
_pmii_daemon(SIGCHLD): [NID 00029] PE 10 exit signal Aborted
[NID 00029] 2011-08-16 08:30:57 Apid 1105178: initiated application termination
tdanf: Run failed

tdanf: debug information (if activated) follows :-

GNU gdb (GDB; SUSE Linux Enterprise 11)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-suse-linux".
For bug reporting instructions, please see:
Core was generated by `/work/n02/n02/eead/um/tdanf/dataw/tdanf.exec'.
Program terminated with signal 6, Aborted.
#0  0x0000000000aef19b in raise (sig=<value optimized out>)
    at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:42
42      ../nptl/sysdeps/unix/sysv/linux/pt-raise.c: No such file or directory.
        in ../nptl/sysdeps/unix/sysv/linux/pt-raise.c
(gdb) #0  0x0000000000aef19b in raise (sig=<value optimized out>)
    at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:42
#1  0x0000000000b05c43 in abort () at abort.c:88
#2  0x000000000040ba01 in abort_ ()
#3  0x0000000000400847 in MAIN__ ()
    at /work/n02/n02/eead/um/tdanf/code/umshell1.f:3131
#4  0x00007fffffff73d8 in ?? ()
#5  0x0000000003f38880 in ?? ()
#6  0x0000000000000000 in ?? ()
(gdb) quit
Quitting: You can't do that without a process to debug.


I have not seen this error message before and I am wondering if something has changed in the last few weeks that might mean these jobs aren't working. Any help would be appreciated. I have tried to attach one of the .leave files in case I have missed any glaring errors.

Thanks in advance,

Attachments (1)

tdanf000.tdanf.d11228.t082941.leave.txt (240.8 KB) - added by eead 9 years ago.
.leave file for tdanf

Download all attachments as: .zip

Change History (3)

Changed 9 years ago by eead

.leave file for tdanf

comment:1 Changed 9 years ago by willie

Hi Aisling,

Earlier in the leave file for job tdang,

OPEN:  File /work/n02/n02/eead/um/tdang/datam/tdanga@dap7031 to be Opened on Unit 21 does not Exist

and later,

 Error in FILE_OPEN called from DERV_LAND_FIELD.
 Trying to open atmos dump.
 Error returned from DERV_LAND_FIELD.
 Error code  1

The start dump is present however. So the problem seems to be that the first file doesn't exist.

I hope that helps.



comment:2 Changed 8 years ago by ros

  • Description modified (diff)
  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.