#1724 closed error (worksforme)

jobs do not run on ARCHER - diagnostics issue?

Reported by: fpithan Owned by: um_support
Priority: normal Component: UM Model
Keywords: Cc:
Platform: ARCHER UM Version: 8.5

Description

Hi,

I am trying to run a series of UM jobs (here xlgta) on ARCHER, but they crash when starting the model execution with the message below. I managed to start a basically identical job before, but have since added a number of diagnostics (mostly related to drag) to be output. I have compiled a new executable and reconfiguration after changing the job. Do you have any better idea than individually adding diagnostics to find out which one causes an issue?

Thanks,
Felix.

qsserver: The following variables were set up
qsserver: RUNID=xlgta
qsserver: ARCHIVEDIR=/nerc/n02/n02/fpithan/umdrag1

qsserver: Waiting for command 1
basename: missing operand
Try `basename —help' for more information.
qsserver: Waiting for command 2
Filtering initial dump data. n_filt= 8
craylibsgoogle-perftools/src/tcmalloc.cc:641] Attempt to free invalid pointer: 0x4046207d7fd723ae
_pmiu_daemon(SIGCHLD): [NID 04470] [c7-2c0s13n2] [Fri Nov 6 15:48:59 2015] PE RANK 4 exit signal Aborted
[NID 04470] 2015-11-06 15:49:04 Apid 18580019: initiated application termination
qsatmos: waiting for qsserver to complete on pid 10091

Change History (7)

comment:1 Changed 22 months ago by fpithan

I found this in the setup check and resolved these issues, hopefully that was the problem…

Errors will be output in this window
Level list invalid in PSEUDO panel of Domain Profile 'DICECAT' (Edit Profile in window atmos_STASH)
Variable: PSLIST_A(*,14)

→ Model Selection

→ Atmosphere

→ STASH

→ STASH. Specification of Diagnostic requirements

→ Domain profile window, 2

List Check Error in window subindep_UMCET
Variable: ENSEMBLE

→ Model Selection

Input/Output? Control and Resources

→ Ensemble Set-up

Verification is complete.
If an error was detected then find the window, enter and close it.
This will either generate a more informative error message or it will
result in the setting of a previously unset hidden variable.

comment:2 Changed 22 months ago by fpithan

  • Resolution set to fixed
  • Status changed from new to closed

comment:3 Changed 22 months ago by grenville

Felix

I'll be surprised if this is the cause of the problem you have - please let us know what happens.

Grenville

comment:4 Changed 22 months ago by fpithan

Thanks Grenville - for now, the run crashed because /nerc can't be accessed, but this seems to be an issue with rdf on ARCHER noted on their status website. I'll keep you posted.

comment:5 Changed 22 months ago by fpithan

  • Resolution fixed deleted
  • Status changed from closed to reopened

Unfortunately, I cannot surprise you - the run keeps crashing with the same error message. Any ideas?

comment:6 Changed 22 months ago by grenville

Felix

The problem is with something in stash.

The model runs OK (well for several hundred tss before I stopped it) if you switch off all stash (see my copy of your job xmbda) - I also switched off archiving, but that wasn't the problem.

I suggest that you play with including stash bit by bit until you find the culprit - did you add stash to the job?

I think you don't need the hand-edit ~karthee/umui_jobs/hand-edits/add_modules.ed - it runs with or without, but one of your errors was perftools related, which may be relate to this hand edit.

Grenville

comment:7 Changed 21 months ago by fpithan

  • Resolution set to worksforme
  • Status changed from reopened to closed

I found a set of STASH that seems to work, leaving out the SSO drag diagnostic in the end and saving blocking and gravity wave drag daily rather than monthly. I do not understand why the more complete set of diagnostics did not work, though.

Note: See TracTickets for help on using tickets.