Opened 12 years ago
Closed 10 years ago
#280 closed help (wontfix)
changing MPI on Eddie for HadCM3 - guidance about an MPI problem please
Reported by: | mjm | Owned by: | um_support |
---|---|---|---|
Component: | UM Model | Keywords: | |
Cc: | m.mineter@… | Platform: | |
UM Version: | 4.5 |
Description
Hi Simon/Jeff?
I'd like to see if you have any thoughts to help with a problem in getting the model to run with more recent MPI libraries.
You'll recall that the MPI version on Eddie that we used since 12 months ago requires 4 extra cores, to be able to use the infinipath fast networking.
Two newer versions of the Inifinipath and of MPI now are available on Eddie, and should get over this problem, and one of the two new options might - hope springs eternal - get around the other issue of a memory leak associated with the buffered I/O.
When I try using the correct modules for either new MPI, the correct SGE options, and after testing GCOM (gcom/test/gcom/test.sh with newly built GCOM), and building the UM with a new GCOM, then with either of the new MPIs I get the similar errors from the worker nodes:
With QLogic's MPI:
"Cannot start receive thread: Cannot allocate memory"
With openib (OpenMPI for Infiniband/infinipath):
"eddie071:1.0.Cannot start receive thread: Cannot allocate memory
(err=23)
[eddie071:20870] Open MPI failed to open a PSM endpoint: Cannot start receive thread: Cannot allocate memory"
I note that we set:
ulimit -s 2000000
( I tried increasing that to 3000000, but was told "operation not permitted". Am trying 32 not 16 processors in case that helps, now.)
I'd welcome any guidance on ways to address this.
Regards
Mike
Change History (1)
comment:1 Changed 10 years ago by lois
- Resolution set to wontfix
- Status changed from new to closed