Opened 12 years ago
Closed 12 years ago
#219 closed error (fixed)
MPI problem on hector
Reported by: | sws04jc | Owned by: | um_support |
---|---|---|---|
Component: | UM Model | Keywords: | |
Cc: | Platform: | ||
UM Version: | 6.1 |
Description
Dear CMS,
I am running a UM vn6.1 LAM job with tracers, 76 levels, 1km horizontal grid length (448 by 352) —- job id xdove, user sws04jc.
The job will not run if I attempt to use more than 32 processors, in spite of the fact that I've attached the modset hector_io. The error message begins "MPICH PtlEQPoll error (PTL_EQ_DROPPED)".
With 32 processors or less, the job runs fine.
I can run 12km and 4km on hector successfully with more processors, no problem.
Solving this problem is not an absolutely essential and urgent priority, but I thought you should know about it.
Thanks in advance for any advice you can offer.
Sincerely,
Jeffrey Chagnon
Change History (3)
comment:1 in reply to: ↑ description ; follow-up: ↓ 2 Changed 12 years ago by sws04jc
comment:2 in reply to: ↑ 1 Changed 12 years ago by sws04jc
And I'd also add that I now see this information posted to the cms webpage under HPC FAQs.
Sorry to have used this space to talk to myself!
If this doesn't fix the problem, then I will get back in touch.
comment:3 Changed 12 years ago by willie
- Resolution set to fixed
- Status changed from new to closed
Willie has pointed out that the hector project webpage recommends increasing the size of the env variable MPICH_PTL_OTHER_EVENTS.
I'm going to try increasing this (from 2048 to 4096).
Finger crossed.
Jeffrey