#2570 closed help (answered)

Unrecoverable library error in glm recon

Reported by: shakka Owned by: um_support
Component: UM Model Keywords: glm_recon, library error
Cc: Platform: Monsoon2
UM Version: 11.1

Description

Hi NCAS

I'm getting an 'unrecoverable library error' message when running a suite at vn 11.1 (a copy of Stu Webster's vn 11.1 nesting suite u-av356 linked here: https://code.metoffice.gov.uk/trac/rmed/wiki/suites/nesting/worked_eg_2018. The suite successfully ran using 16,32 processors when the cycle dates were in 2015 (default), but I have now changed the dates to the case study I am interested in, and am getting the following error in the glm_recon stage after around 6 minutes:

"lib-4205 : UNRECOVERABLE library error

The program was unable to request more memory space."

I notice there are several tickets regarding this error, but I have been unable to solve the problem using any of the solutions offered in 2151, 2202, 687 or 1937.

I have tried changing the number of CPUs to drive the model (Driving model setup > dm_nproc) from 16,32 to 24,36 and 30,36, but have had no luck with either of those.

Can you please advise?

Thanks,
Ella

Change History (4)

comment:1 Changed 11 months ago by shakka

Update: In suite-adds.rc I tried changing the line

{% set MPI_TASKS_PER_NODE = (NCPU_PER_NODE * HYPERTHREADS / ( 3 * OMP_NUM_THREADS ) ) |int %}

to {% set MPI_TASKS_PER_NODE = (NCPU_PER_NODE * HYPERTHREADS / ( 8 * OMP_NUM_THREADS ) ) |int %}

and the recon has run, but it seems like the jobs are taking longer to queue than usual.

Ella

comment:2 Changed 10 months ago by willie

Hi Ella,

What's the suite id?

Willie

comment:3 Changed 10 months ago by shakka

Hi Willie,

I think my solution has fixed the problem - I reckon it was just taking longer to queue jobs last week because there were some issues with Monsoon that I didn't know about when I raised the ticket.

Thanks,
Ella

comment:4 Changed 10 months ago by willie

  • Resolution set to answered
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.