Opened 9 years ago

Closed 9 years ago

#671 closed help (fixed)

slow running jobs

Reported by: swr07dmm Owned by: ros
Component: UM Model Keywords:
Cc: Platform:
UM Version: 6.6.3

Description

I have 3 jobs running on monsoon under the user dmitch, but they seem to be running very slowly and also they seem to be constantly in the cue rather than running.

Before when I just had one job running it would output about 2 years per day, but now its more like 3-6months per day.

It is simply because I have 3 jobs running that makes it so slow? although its seems less than a 3rd of the speed before to me!

Many thanks,
Dann

Change History (2)

comment:1 Changed 9 years ago by ros

  • Owner changed from um_support to ros
  • Status changed from new to accepted

The amount of time the jobs spend in the queue is determined by the fairshare policy (Details of the fairshare policy are available on the Met Office Twiki http://collab.metoffice.gov.uk/twiki/bin/view/Support/SubmittingJobs#FairShare) and takes into account the entitlement and past usage for the project group. The fairshare boost for clpredic project is currently less than that for most other projects. So this will have an impact on whether your job or one from another project gets to run first.

Having 3 jobs running may slow down how quickly a job gets to run, if the machine is full which it has been, as the 3 jobs are competeing against each other for the available resources. One job may have to wait for one of your other jobs to resubmit before it can run.

I can ask the MONSooN guys to investigate if you feel your jobs are waiting an extraordinarily long time to run. I assume that when a job does get to run that it's running at the expected speed and it's just the amount of queueing time that is making turnaround slow?

Regards,
Ros.

comment:2 Changed 9 years ago by ros

  • Resolution set to fixed
  • Status changed from accepted to closed

This ticket is now being closed.

Additional information emailed to Dann.

Over the last 5 days the project has been overdoing it (as have some others). Ideally the project should be spreading their 2500 hours evenly across the month.
If there are spare resources available, the scheduler will let you use more than your share, however, this will count against your usage when everyone is using the system, as is the current situation. This is why you're getting penalised by fairshare at the moment and have seen an increase in the queueing time.
If the project eases off over the next few days and doesn't go over their allocation, the fairshare boost will increase as the clpredic usage history gets forgotten.

Note: See TracTickets for help on using tickets.