Opened 13 years ago

Closed 13 years ago

#10 closed help (fixed)

queues for running the UM on HPCx

Reported by: D.J.Lunt@… Owned by: lois
Component: UM Model Keywords: HPCx
Cc: Platform:
UM Version:

Description

I am in the process of running my first UM job on hpcx. Things seem to be going ok at the moment (thanks to help from John Hughes here at Bristol), but I am wondering about the queue system.

For a start, I'm not sure about the difference between development, capacity and capability queues. Ultimately, I will be running 32 processors (say) for several months. I realise I will have to resubmit (do you have any scripts which do this automatically? Paul had some for newton but I'm not sure how transferable they are). It looks like the 'development' queue would be best for this, but i don't seem to have access to that (I am running with 'tic' code n02-bas, and asked for 32 processors for 12 hours but it said it couldn't find a suitable queue so I guess that I can't submit to the development queue).

Secondly, do you have an idea what the optimum number of processes for a HadCM3L job is? I have inherited 8*4 (lon*lat) from John Hughes, but was wondering if you have a handle on the 'best' setup.

thanks a lot, Dan

Dan Lunt Tel: +44 (0) 117 928 8186 School of Geographical Sciences University of Bristol d.j.lunt@… University Road http://www.bridge.bris.ac.uk Bristol BS8 1SS

Change History (1)

comment:1 Changed 13 years ago by admin

  • Reporter changed from admin to D.J.Lunt@…
  • Resolution set to fixed
  • Status changed from new to closed

Hello Dan

There are currently 2 parts to the HPCx service - the development service (192 processors for NERC users only, not currently charged for, so with an unlimited allocation) - the capability service (1440 processors for all users, which is charged, so each group has an allocation)

The development service is for smaller jobs, 16, 32, 64 processors and the queues are set up at the moment so that shorter jobs, 20 minutes and 1 hour, run during the working day and longer jobs, 6 and 12 hours, run overnight and at the weekend. The capability service gives priority to large jobs, 256, 512, 1024 processors so although there are job classes for 32, 64, 128 processor jobs these smaller jobs have a lower priority and longer jobs are not allowed. To look at the different job classes use the command llclass on HPCx. You will see classes for the development part have names with parn whereas classes on the capability service have just par.

The development service can only be accessed by using the TIC code n02-ncas. The capability service can be accessed by using other NCAS subgroup codes such as n02-bas or n02-bjob.

You can use the normal UM resubmission mechanism, run an NRUN job, the change NRUN to CRUN in the submit script, see the NCAS user guide to automatic resubmission. You can also use the job stacking feature of HPCx but we don't yet have an NCAS user guide for this, we should have one soon!

For HadCM3L then 64 processors may give you the best performance but not necessarily the best turnaround time on the development service. Better turnaround might be achieved by sacrificing a bit of performance and using 32 processors.

Let me know if I can be of any further help.

Lois

Note: See TracTickets for help on using tickets.