Opened 9 years ago

Closed 8 years ago

#783 closed help (fixed)

Shared Nodes Performance Slowdown

Reported by: pliojop Owned by: jeff
Component: UM Model Keywords:
Cc: Platform:
UM Version: <select version>

Description

Hi,

Not sure if this is the right place for this, but thought I'd start here.

I run UM4.5 on a HPC cluster at the University of Leeds, called ARC1. Recently for the UM only if a 16 processor job shares a node with another 16 processor job it causes a slow down in the CPU speeds on the shared node

ie

Node 1 = 12 cores used
node 2 = 4 cores and 4 cores used
node 3 = 12 cores used

Nodes 1 & 3 will run at close to 100% while node 2 will be down at 33-50%.

I was wondering if this had been encoutered before during any changes to the HECTOR computers over the years, and if it ahd been if a fix was applied.

Many Thanks

James Pope
eejop@…

Change History (2)

comment:1 Changed 9 years ago by jeff

  • Owner changed from um_support to jeff
  • Status changed from new to accepted

Hi James

This situation can't arise on Hector as sharing of nodes between jobs is not allowed.

The problem looks to be that the codes sharing a node are running on the same cores instead of using separate cores. This is probably a problem with whatever program you use to launch the mpi executable and you will need to talk to your local support people.

Jeff.

comment:2 Changed 8 years ago by jeff

  • Resolution set to fixed
  • Status changed from accepted to closed
Note: See TracTickets for help on using tickets.