#2315 closed help (answered)

pumf: Failed to extract header info from ar481a.pa1999sep.pp

Reported by: gn907779 Owned by: ros
Component: UM Model Keywords:
Cc: Platform: Monsoon2
UM Version: 10.6

Description (last modified by ros)

I am running a 10.6 suite on Monsson u-ar481. The ATMOS task is successful, but the postproc task fails.

This is job.err:-

[whdav@exvmsrose:~/cylc-run/u-ar481/log/job/19990901T0000Z/postproc/01]$ more job.err
[WARN]  [SUBPROCESS]: Command: /projects/um1/vn10.6/xc40/utilities/um-pumf -h /home/d03/whdav/cylc-run/u-ar481/log/job/19990901T0000Z/postproc/01/job-pumfhead.out /home/d03/whda
[SUBPROCESS]: Error = 1:
        [INFO] File(1): /home/d03/whdav/cylc-run/u-ar481/share/data/History_Data/ar481a.pa1999sep.pp
[WARN] Using default STASHmaster as none provided "/projects/um1/vn10.6/ctldata/STASHmaster".
[INFO] Using script: /projects/um1/vn10.6/xc40/utilities/um-pumf
[INFO] Using executable: /projects/um1/vn10.6/xc40/utilities/um-pumf.exe
/projects/um1/vn10.6/xc40/utilities/um-pumf: line 198: 17252 Aborted                 (core dumped) $pumf_exec > $PUMF_OUT 2>&1
[INFO] Header output in:   /home/d03/whdav/cylc-run/u-ar481/log/job/19990901T0000Z/postproc/01/job-pumfhead.out
[INFO] Field output in:    /home/d03/whdav/cylc-run/u-ar481/work/19990901T0000Z/postproc/pumf_out_Dtez/pumf_field
[FAIL] Problem with PUMF program

[ERROR]  pumf: Failed to extract header information from file /home/d03/whdav/cylc-run/u-ar481/share/data/History_Data/ar481a.pa1999sep.pp
[FAIL]  Command Terminated
[FAIL] Terminating PostProc...
[FAIL] main_pp.py atmos # return-code=1
2017-11-07T15:11:40Z CRITICAL - Task job script received signal EXIT

This is pumf_out_Dtez/pumf_field

[whdav@exvmsrose:~/cylc-run/u-ar481/log/job/19990901T0000Z/postproc/01]$ more /home/d03/whdav/cylc-run/u-ar481/work/19990901T0000Z/postproc/pumf_out_Dtez/pumf_field
Warning in umPrintMgr: umPrintLoadOptions : Failed to get filename for IO control file from environment

 GCOM Version 6.1
 Using precision : 64bit INTEGERs and 64bit REALs
 Built at Thu Oct  6 08:04:35 BST 2016

UMPRINTSETLEVEL: PrintStatus initialised=   4
[0] exceptions: Setting option 2
[0] exceptions: Registering callback at 0x0042ef40
App ID: 5, Name:Print UM File
- Data size is 64 bit. Program is 64 bit.
- Program is serial.
FILE_MANAGER: Assigned : pseudo-file for UNIX operations
FILE_MANAGER:          : id   : io_reserved_unit
FILE_MANAGER:          : Unit :  10 (portio)
Buffered I/O active. Buffer size set to 524288, 8 byte words
IO: Initialised IO
Host is shared100

FILE_MANAGER: Assigned : /home/d03/whdav/cylc-run/u-ar481/share/data/History_Data/ar481a.pa1999sep.pp
FILE_MANAGER:          : Unit :  11 (portio)
IO: Switching file mode to local because there is no IO server
IO: Opening unit  11 with collective(broadcast) semantics
IO: Read Only mode
OPEN:  File /home/d03/whdav/cylc-run/u-ar481/share/data/History_Data/ar481a.pa1999sep.pp to be Opened on Unit 11 Exists
IO: Open: /home/d03/whdav/cylc-run/u-ar481/share/data/History_Data/ar481a.pa1999sep.pp on unit  11
loadHeader: Model Version: 128849019.87
tcmalloc: large alloc 6734511509115011072 bytes == (nil)

???!!!???!!!???!!!???!!!???!!!       ERROR        ???!!!???!!!???!!!???!!!???!!!
???!!!???!!!???!!!???!!!???!!!       ERROR        ???!!!???!!!???!!!???!!!???!!!
?  Error code: 4205
?  Error code: 4205
?  Error from routine: main_compare
?  Error from routine: main_compare
?  Error message: Failed to allocate 3147656947853068556 words for the integer header of the file
?  Error message: Failed to allocate 3147656947853068556 words for the integer header of the file
?  Error from processor: 0
?  Error from processor: 0
?  Error number: 0
?  Error number: 0

[0] exceptions: An non-exception application exit occured.
[0] exceptions: whilst in a serial region
[0] exceptions: Task had pid=0 on host
[0] exceptions: Program is "/projects/um1/vn10.6/xc40/utilities/um-pumf.exe"
[0] exceptions: calling registered handler @ 0x0042ef40
Warning in umPrintMgr: umPrintExceptionHandler : Handler Invoked
[0] exceptions: Done callbacks
gc_abort (Processor     0): Job aborted from ereport.

Any ideas on how I can fix this?


Change History (3)

comment:1 Changed 20 months ago by ros

  • Description modified (diff)
  • Owner changed from um_support to ros
  • Status changed from new to accepted

Hi William,

pumf shouldn't be being run on the resulting .pp file. If you've tried to run the postprocessing more than once then perhaps it has got itself confused. I see that you did have problems at one point due to permissions on MASS. I've tried running your suite and it runs the atmos task and then successfully runs the postproc task as well.

I can only suggest clearing out the ~/cylc-run/share/data/History_Data directory and start the suite again.


comment:2 Changed 20 months ago by ros

I have spoken to the owner of the postproc app and it's probably a bug in postproc_2.0 where .pp files created by a former failed task aren't explicitly excluded from any attempt to pumf them in subsequent runs of the task. This is fixed in the next version of postproc so the advice is to upgrade to postproc_2.1


comment:3 Changed 19 months ago by ros

  • Resolution set to answered
  • Status changed from accepted to closed
Note: See TracTickets for help on using tickets.