Opened 6 years ago

Closed 6 years ago

#1209 closed help (fixed)

ACUMPS error on MONSooN

Reported by: swr04ojb Owned by: um_support
Component: UM Model Keywords:
Cc: Platform: MONSooN
UM Version: 6.6.3

Description

Hello,

I'm running job xinhf as part of the LastMil? simulation. It was running along happily enough, but then fell over producing the following message…

*

UM ERROR (Model aborting) :
Routine generating error: U_MODEL
Error code: 4
Error message:

ACUMPS: Diagnostic error. See output for item no.

*
gc_abort (Processor 0 ): ACUMPS: Diagnostic error. See output for item no.

Traceback:

Offset 0x00000010 in procedure xltrbk_
Offset 0x000000f8 in procedure gc_abort_, near line 180 in file /projects/um1/gcom/gcom3.5/meto_ibm_pwr6_mpp/ppsrc/gcom/gc/gc_abort.f
Offset 0x0000033c in procedure ereport_, near line 384 in file /projects/lastmil/abozzo/xfkmj/ummodel/ppsrc/UM/control/misc/ereport.f90
Offset 0x00036868 in procedure u_model_, near line 6887 in file /projects/lastmil/abozzo/xfkmj/ummodel/ppsrc/UM/control/top_level/u_model.f90
Offset 0x0000230c in procedure um_shell, near line 4312 in file /projects/lastmil/abozzo/xfkmj/ummodel/ppsrc/UM/control/top_level/um_shell.f90
—- End of call chain —-

xinhf: Run failed
*

From what I can tell ACUMPS is an partial sum aggregator for the dumps, but I'm not sure how to chase this error further? (I've attached the relevant .archive file, in case that helps)

kind regards,

oliver

Change History (5)

comment:1 Changed 6 years ago by swr04ojb

Apparently I'm not allowed to add a file greater than 0.5mb, and the .archive file is 12mb, so suffice to say I can't upload it. I think everyone should be able to see it from MONSooN..

/home/olbrow/output/xinhf014.xinhf.d14031.t002500.archive

comment:2 Changed 6 years ago by willie

Hi Oliver,

Is it possible that you've run out of disk space:

 ERROR: checksum failure in climate mean
 Section  3  item  334
 This can be due to invalid values in field, or corruption of partial sum file
 Remove or fix diagnostic, and rerun
 MEANCTL: RESTART AT PERIOD_ 0
 U_MODEL: interim history file deleted due to failu     re writing partial sum files
 *********************************************************************************
 UM ERROR (Model aborting) :
 Routine generating error: U_MODEL
 Error code:  4
 Error message: 
ACUMPS: Diagnostic error. See output for item no.
 *********************************************************************************

Regards,

Willie

comment:3 Changed 6 years ago by swr04ojb

Hi Willie,

thanks for your response.

Is it possible that you've run out of disk space

Erm, I suppose that might be possible, I'm not sure. I've just
checked on quota..

quota.pl -r projects

Block Limits | File Limits

Name Type TB Quota % | Files Quota %
lastmil FILESET 0.81 1.50 53.90 | 126538 0 0.00
olbrow USR 0.16 0.00 0.00 | 14965 0 0.00

and don't see a problem? I'm not entirely sure how to check our
allowance on MASS, it doesn't seem very obvious from the MONSooN wiki
(or maybe I just haven't stumbled on to the right page yet).

Do you have any other suggestions of things to check?

regards,

oliver

comment:4 Changed 6 years ago by willie

Hi Oliver,

If it is not disk space, then we need to check for NaNs? or unphysical values in this field. Look at the latest output in xconv. You can detect NaNs? by cumf'ing a file with itself and looking at the summary output: any differences will be due to NaNs?.

Regards,

Willie

comment:5 Changed 6 years ago by willie

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.