Opened 3 months ago

Closed 3 months ago

#2913 closed help (fixed)

Very very long queueing time

Reported by: charlie Owned by: um_support
Component: UM Model Keywords:
Cc: Platform: NEXCS
UM Version: 10.7

Description

Hi,

Sorry to bother you, but I was just wondering if you could advise on why one of my suites is taking so incredibly long in the queue? It is u-bi823 and is an atmosphere-only configuration, which I am wanting to run to test possible reasons for a blowup in my coupled run. I am using exactly the same number of processes (EW = 36, NS = 28, OpenMP threads = 1, IO server processes = 0) as my coupled run - the latter was only queueing for about an hour or 2 (at most) last week, but this one has so far been stuck in the queue for 2 days, as I submitted it on Wednesday morning and it has done nothing since.

Might there be something in my submission settings, that I didn't realise about, which is preventing it from going through? I find it hard to believe that this is simply because of other people's usage, i.e. this week is so much worse than just a week ago.

Thanks,

Charlie

Change History (5)

comment:1 Changed 3 months ago by grenville

Charlie

If you have not changed your submission settings, then user contention is the likely cause of the wait times. The machine is very busy and we expect it to remain so for the foreseeable future I'm afraid.

Grenville

comment:2 Changed 3 months ago by charlie

I was worried you would say that. Is there really nothing I can do, e.g. reducing my number of processes, to get it through faster? It has now, finally, submitted it's first task, but has been stuck in the queue since last Wednesday morning.

Is this something we could perhaps have a chat about, maybe with other palaeo people like Dan, because clearly for long climate runs (which usually require at least 500 years), this current system is just not viable?

Thanks,

Charlie

comment:3 Changed 3 months ago by grenville

Charlie

I'm not sure how to answer - wait times inevitably increase with increasing machine workload; you probably could shorten the cycle length to get jobs in the normal queue, which may give the appearance of better thoughput, but you'd then need to queue 6 times instead of 1x. You can experiment, but a solution for faster turnaround today may not work tomorrow.

Grenville

comment:4 Changed 3 months ago by charlie

Many thanks.

As I said, I think perhaps we should have a chat off-line, to discuss our long palaeo runs and possible alternatives. I will email you separately about that.

Charlie

comment:5 Changed 3 months ago by charlie

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.