#2557 closed help (answered)

pumf: Failed to extract header information

Reported by: marcus Owned by: um_support
Component: UM Model Keywords: postproc, pumf
Cc: Platform: NEXCS
UM Version: 10.6

Description

Hi, I am basically having the same problem as in ticket #2315 with two of my suites since the last MASS downtime yesterday. The recommended solution was to upgrade to postproc_2.1. My understanding is that I am already using postproc_2.2, so perhaps this won't resolve this problem.

The error occurs in both u-az166 and u-az691. The atmos tasks ran fine, however the postproc tasks have crashed. I have restarted the suites and tried to remove the offending file, but no success, it persistently fails.

The error is in u-az166 (it's the equivalent error for u-az691):

WARNING:
This computer is provided for the processing of official information.
Unauthorised access described in Met Office SyOps may constitute a criminal offence.
All activity on the system is liable to monitoring.
[WARN] file:atmospp.nl: skip missing optional source: namelist:archer_arch
[WARN] file:nemocicepp.nl: skip missing optional source: namelist:archer_arch
[WARN] file:pptransfer.nl: skip missing optional source: namelist:archer_arch
[WARN] file:pptransfer.nl: skip missing optional source: namelist:pptransfer
[WARN]  Mule Module is not available. um-pumf will be used.
[WARN]  Iris Module is not available
[WARN]  [SUBPROCESS]: Command: /projects/um1/vn10.6/xc40/utilities/um-pumf -h /home/d03/makoe/cylc-run/u-az166/log/job/19970801T0000Z/postproc/02/job-pumfhead.out /home/d03/makoe/cylc-run/u-az166/share/data/History_Data/az166a.pa1997jul
[SUBPROCESS]: Error = 1:
	[INFO] File(1): /home/d03/makoe/cylc-run/u-az166/share/data/History_Data/az166a.pa1997jul
[INFO] NPRINT: 8
[INFO] XPRINT: 5
[WARN] Using default STASHmaster as none provided "/projects/um1/vn10.6/ctldata/STASHmaster".
[INFO] Using script: /projects/um1/vn10.6/xc40/utilities/um-pumf
[INFO] Using executable: /projects/um1/vn10.6/xc40/utilities/um-pumf.exe
/projects/um1/vn10.6/xc40/utilities/um-pumf: line 198: 30032 Aborted                 (core dumped) $pumf_exec > $PUMF_OUT 2>&1
[INFO] Header output in:   /home/d03/makoe/cylc-run/u-az166/log/job/19970801T0000Z/postproc/02/job-pumfhead.out
[INFO] Field output in:    /home/d03/makoe/cylc-run/u-az166/work/19970801T0000Z/postproc/pumf_out_Nj0t/pumf_field
[FAIL] Problem with PUMF program

[ERROR]  pumf: Failed to extract header information from file /home/d03/makoe/cylc-run/u-az166/share/data/History_Data/az166a.pa1997jul
[FAIL]  Command Terminated
[FAIL] Terminating PostProc...
[FAIL] main_pp.py atmos # return-code=1
2018-07-25T13:16:31Z CRITICAL - failed/EXIT

Please what should I do to resolve this?

Many thanks,
Marcus

Change History (5)

comment:1 Changed 17 months ago by grenville

Marcus

Have you looked at /home/d03/makoe/cylc-run/u-az166/share/data/History_Data/az166a.pa1997jul ?

xconv says "WGDOS data header record mismatch" - I have not looked through he .out file to see if there's any clues.

Grenville

comment:2 Changed 17 months ago by marcus

Hi Grenville,

I have moved file az166a.pa1997jul away from History_Data and restarted the postproc process. This suite seems to be running now. Looking through the logs of atmos_main and postproc for 19970701 I didn't find anything that pointed to what might have gone wrong. Perhaps there is something but I cannot see it.

With u-az691 I tried the same trick, however postproc still keeps failing and it claims that it fails to extract header information from file az691.pa1985oct. Even though after I have removed the file and it is no longer in History_Data, the postproc process claims that there is a problem with this the headers of this file.

comment:3 Changed 17 months ago by grenville

You probably need to remove the relevant .arh file — I'm not suggesting you do that. I think the source of the problem needs to be discovered.

comment:4 Changed 17 months ago by marcus

Hi Grenville,

I believe that these files were corrupted when the filesystem of the xcs failed earlier this week. I have now identified those file with corrupted WGDOS data headers and moved them to a backup space. The postproc process has currently running for u-az691, therefore both suites seem to continue OK for now.

In the atmos_main logs I found no evidence of the failure, but I'm sure it ought to be logged somewhere.

I suspect the other option would have been to restart the model from an earlier dump.

Regards,
Marcus

comment:5 Changed 17 months ago by grenville

  • Resolution set to answered
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.