Opened 11 years ago

Closed 10 years ago

#280 closed help (wontfix)

changing MPI on Eddie for HadCM3 - guidance about an MPI problem please

Reported by: mjm Owned by: um_support
Component: UM Model Keywords:
Cc: m.mineter@… Platform:
UM Version: 4.5

Description

Hi Simon/Jeff?

I'd like to see if you have any thoughts to help with a problem in getting the model to run with more recent MPI libraries.

You'll recall that the MPI version on Eddie that we used since 12 months ago requires 4 extra cores, to be able to use the infinipath fast networking.

Two newer versions of the Inifinipath and of MPI now are available on Eddie, and should get over this problem, and one of the two new options might - hope springs eternal - get around the other issue of a memory leak associated with the buffered I/O.

When I try using the correct modules for either new MPI, the correct SGE options, and after testing GCOM (gcom/test/gcom/test.sh with newly built GCOM), and building the UM with a new GCOM, then with either of the new MPIs I get the similar errors from the worker nodes:

With QLogic's MPI:
"Cannot start receive thread: Cannot allocate memory"

With openib (OpenMPI for Infiniband/infinipath):

"eddie071:1.0.Cannot start receive thread: Cannot allocate memory

(err=23)

[eddie071:20870] Open MPI failed to open a PSM endpoint: Cannot start receive thread: Cannot allocate memory"

I note that we set:
ulimit -s 2000000

( I tried increasing that to 3000000, but was told "operation not permitted". Am trying 32 not 16 processors in case that helps, now.)

I'd welcome any guidance on ways to address this.

Regards
Mike


Change History (1)

comment:1 Changed 10 years ago by lois

  • Resolution set to wontfix
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.