#2588 closed help (fixed)

lib-4205 : UNRECOVERABLE library error

Reported by: nx902220 Owned by: um_support
Component: UM Model Keywords:
Cc: Platform:
UM Version:

Description

Hi Willie,

I have taken a copy of your suite u-az946@85909 called u-ba463. It runs all the way down to the 55 m nest where it fails in reconfiguration with the following error message:

lib-4205 : UNRECOVERABLE library error

The program was unable to request more memory space.

Application 37965864 is crashing. ATP analysis proceeding…

ATP Stack walkback for Rank 21 starting:

_start@…:113
libc_start_main@…:242
main@…:82
rcf_control$rcf_control_mod_@…:156
rcf_create_dump$rcf_create_dump_mod_@…:758
rcf_field_dependent_calcs$rcf_field_dependent_calcs_mod_@…:264
rcf_recompute_wet_rho$rcf_recompute_wet_rho_mod_@…:181
rcf_alloc_field$rcf_alloc_field_mod_@…:67
ALLOCATE@0x2055af73
_lerror@0x2085c1a7
abort@…:92
raise@…:42

ATP Stack walkback for Rank 21 done
Process died with signal 6: 'Aborted'
Forcing core dumps of ranks 21, 0
View application merged backtrace tree with: stat-view atpMergedBT.dot
You may need to: module load stat

Please can you help me with this? I have not seen this error in earlier versions of the nesting suite.

Cheers,

Lewis

Change History (3)

comment:1 Changed 16 months ago by willie

Hi Lewis,

It seems to have created a 55m.start dump which at 38GB is only 2GB bigger than the non-tracer version in u-at199.

You might be able to get away with just re-triggering. If it fails again, then a more longer term solution is to give it more memory by increasing the number of processors in the 55m_um_recon task:

:
        [[[environment]]]
           RCF_NPROCX = 8
           RCF_NPROCY = 9
           FLUME_IOS_NPROC = 0
           TOTAL_MPI_TASKS = 72
           MPI_TASKS_PER_NODE = 36
           OMP_NUM_THREADS = 1
           HYPERTHREADS = 1
           TASKS_PER_NUMA = 18
           ROSE_LAUNCHER_PREOPTS="""-n $TOTAL_MPI_TASKS
                                     -N $MPI_TASKS_PER_NODE
                                     -S $TASKS_PER_NUMA
                                     -d $OMP_NUM_THREADS
                                     -j $HYPERTHREADS
                                 """
        [[[directives]]]
          -l select=2
:

Here I have increased from 36 to 72 processors and remembered to change the number of nodes from one to two.

NB The number of MPI tasks per node must be 36.

Regards,
Willie

Last edited 16 months ago by willie (previous) (diff)

comment:2 Changed 16 months ago by nx902220

Makes sense thank you. Reconfiguration succeeds now.

Cheers,

Lewis

comment:3 Changed 16 months ago by willie

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.