Opened 8 years ago

Closed 8 years ago

#832 closed help (fixed)

problem moving from 2 nodes to 4 nodes

Reported by: jonny Owned by: willie
Component: UM Model Keywords:
Cc: Platform:
UM Version: 6.1

Description

Hi,
I have a HadGEM1.2 job on hector phase3 that works fine on 2 nodes (8*8) but when I move to 4 nodes (various combinations of NS-EW) and recompile+run. It stays on the wall for the whole duration of the allotted queue time and then terminates without producing any output fields. Stash is set to output some 6hourly data, and I would certainly expect some output to be generated in the allotted queue time.

here is the .leave file
/home/n02/n02/jonny/um/umui_out/xgyuj000.xgyuj.d12087.t170237.leave

but it is not very informative. I think the executable is simply not running. Have you seen this before? Any ideas what might be causing it?

Cheers
Jonny

Change History (3)

comment:1 Changed 8 years ago by willie

  • Owner changed from um_support to willie
  • Status changed from new to accepted

Hi Jonny,

I suspect you are trying to use too many processors. Your domain size is 192EW x 145 NS and you have requested 32 EW processors, so this is 6 grid points, so it won't have a halo of 4 each side. This is discussed in UM documentation paper C71.

Regards,

Willie

comment:2 Changed 8 years ago by jonny

Hi Willie,
I'm a little confused about how the halo relates to a global simulation and how this affects the processor decomposition. Is this something which needs to be accounted for in both global and LAM runs?

Cheers
Jonny

comment:3 Changed 8 years ago by willie

  • Resolution set to fixed
  • Status changed from accepted to closed
Note: See TracTickets for help on using tickets.