Opened 2 months ago

Closed 4 weeks ago

#3037 closed help (answered)

Running the UM at version 10.6

Reported by: cdaleu Owned by: um_support
Component: UM Model Keywords:
Cc: Platform: ARCHER
UM Version: 10.6

Description

Hi

I have copied the trunk using the following command: fcm:um.x-tr@vn10.6
The branche name is: bracnches/dev/chimenedaleu/vn10.6_um_rce_runs
and the source code are in this dirrectory: /home/cdaleu/UM_RUNS/vn10.6_um_rce_runs

I will like to run the UM simulations and performed sensitivity of my results to the convection parameterization schemes
(GA6, Kain-Fritsh, Plant-Craig, Betts-Miller).
The suite I am using is: u-bn523
u-bn523 is a copy of u-ap259 (From Jian-Feng Gu)
I have chosen to use that suite because it gives to option to run the UM with Kain-Firtsh (i_convection_vn=11) and Plant-Graig (i_convection_vn=12)

I have tried running the trunk with the suite u-bn523 and it fails.
The runs fails for all possible values of i_convection_vn (I have tried =6, =11, and =12).

I have even tried reducing the domain size, the length of the run and all the runs fail.

These are my first attempt to run the full UM and I believe that I am doing something wrong.
Please I will need your help

Looking forward to hear from you

Change History (5)

comment:1 Changed 2 months ago by ros

Hi Chimene,

Did you check that the suite ran ok before you made any source code and configuration changes?

The problem is due to an incompatibility with the processor decomposition you have set and the extended halo size. See (log/job/20000101T0000Z/atmos_atmos/01/job.err)

Error message: Too many processors in the East-West direction ( 16) to support the extended halo size ( 4). Try running with 4 processors.

Regards,
Ros.

Last edited 2 months ago by ros (previous) (diff)

comment:2 Changed 2 months ago by ros

Hi Ros

I have tried running with 4 processors in the East-West direction (by setting rn03_ewproc =4)and it FAILED
i have also changed rn03_nsproc from 12 to 4 and the runi FAILED again.
The Error message looks like this:

Error code: 100
?  Error from routine: UM_SHELL
?  Error message: UM started on     192 PEs but      16 asked for. Please adjust decomposition
?  Error from processor: 25
?  Error number: 0

I don't know where the 192 PEs is coming from. However, 192 =16*12 which are the previous values of rn03_ewproc and rn03_nsproc respectively.
For the new run rn03_ewproc =4 and rn03_nsprocare =4 which makes 4*4=16

Please help
my domain size is only 16*16 grid points and the length of the runs is 2 days just to know if it can run before setting for larger domain and longer runs

Thanks for your support

Last edited 2 months ago by ros (previous) (diff)

comment:3 Changed 2 months ago by ros

Hi Chimene,

The number of nodes requested is hard-wired in the suite.rc file. In suite.rc [[{{resn["name"]}}_atmos]] section you need to change the following line to reflect the number of nodes you are running on.

-l select = 8 # nodes

Additionally run on fully populated nodes (multiple of 24 PEs) so try a decomposition of 4 x 12 and set -l select = 2

For what it's worth I ran Jian-Feng's original settings 64x64 gridpoints, 48000x48000 grid spacing on 16x12 decomposition, extended halo size 4 and the job appears to run fine.

comment:4 Changed 2 months ago by ros

P.S. You'll also need to change in the suite.rc file the TOTAL_MPI_TASKS appropriately too - currently hard-wired to 192.

comment:5 Changed 4 weeks ago by ros

  • Resolution set to answered
  • Status changed from new to closed

I assume you got this working ok so closing ticket.

Note: See TracTickets for help on using tickets.