Opened 3 weeks ago

Last modified 3 weeks ago

#2534 new help

domain decomposition

Reported by: ggxmy Owned by: um_support
Priority: normal Component: UM Model
Keywords: MPI Cc:
Platform: ARCHER UM Version: 8.2

Description

Hi. I just wondered if the domain decomposition is optimised more or less. I guess so because I copied Willie's job but just wanted to make sure.

tewn? (where ?=b,f,g,h so far) are based on xlhub and vn8.2 limited area model with 12 km resolution. It has 12x10 MPI domain decomposition and 24 tasks per node. It took about 24 hours to simulate 30 days. Here are a copy of lines from umuisubmit_run ;

# MPP time limits
export UM_NPES=120
export  NPROC_MAX=132
export NTASKS_PER_NODE=24
export NTASKS_PER_NUMANODE=12
export NTHREADS_PER_TASK=1
export FLUME_IOS_NPROC=12
export UM_ATM_NPROCX=12
export UM_ATM_NPROCY=10

Are these good? Is there a possibility that I can increase the domain decomposition and make the run faster? If you simply answer yes and no to these questions I can happily close this ticket. Or otherwise could I have an advice on this please?

Thank you.
Masaru

Change History (3)

comment:1 Changed 3 weeks ago by ggxmy

I forgot to mention this. These jobs seem to be using only 6 nodes;

Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
5436670.sdb     masara   long     tewnh_run     --    6 144    --  48:00 Q   --

and thought maybe they could use more (like 12)? Also 12x10 divided by 24 (tasks per node) gives me 5 and not 6. But pe_out contains upto pe131. So actually 132 processors are used? In my previous experience I thought these numbers were consistent. That's a part of the reason why I wonder.

Masaru

comment:2 Changed 3 weeks ago by willie

Hi Masaru,

There was a small investigation of performance that resulted in the current arrangement. The decomposition is 12 x 10 processors plus 12 processors for the IO servers so 132 in all. At 24 procs/node this is 5.5 which rounds up to six.

It may be possible to find a faster arrangement, but there is also a cost to be considered too. This is basically the number of processor hours consumed in the calculation. At high processor counts the communication between processors can dominate the cost.

Regards
Willie

comment:3 Changed 3 weeks ago by ggxmy

HI Willie,

Thanks for the answer. So 12 processors are only working for IO… Oh that's shown by this line?

FLUME_IOS_NPROC=12

Masaru

Note: See TracTickets for help on using tickets.