Opened 11 years ago

Closed 9 years ago

#277 closed help (fixed)

memory leak in UM4.5 on HECToR ?

Reported by: robin Owned by: um_support
Component: UM Model Keywords: HECToR, UM4.5, memory
Cc: Platform:
UM Version: 4.5

Description (last modified by robin)

initial symptom: FAMOUS runs frequently being killed with

_pmii_daemon(SIGCHLD): PE 6 exit signal Killed
[NID 12315]Apid 120510: initiated application termination
[NID 12315] Apid 120510: OOM killer terminated this process.

message after ~7,8 wallclock hours of run. Compiled with pathscale -O3, as suggested by Annette Osprey.

Information from /proc/self/status collected during run implies up to 32k of extra memory is claimed every atmosphere timestep (depends on processor), which adds up over long runs. This seems to be made of 4k contributions, mostly from mpp_filter, but also other places in atm_dyn and (even less often) atm_phys.

Should be work roundable with more frequent resubmissions, of course.

cheers,

robin

Change History (2)

comment:1 Changed 11 years ago by robin

  • Description modified (diff)

comment:2 Changed 9 years ago by lois

  • Resolution set to fixed
  • Status changed from new to closed

There is an issue about the memory usage of MPI on the XT systems which Cray are aware of, hopefully this will be resolved in new releases of the MPT libraries.

Note: See TracTickets for help on using tickets.