Opened 11 years ago

Closed 11 years ago

#290 closed help (fixed)

Jobs not running after error-free compilation

Reported by: swr06rjk Owned by: jeff
Component: UM Model Keywords:
Cc: Platform:
UM Version: 6.1

Description

I have submitted jobs xdofh and xdofe a few times since yesterday evening. Each time, the job seems to compile OK, but at the bottom of the .comp.leave file is the message "qsub: Job rejected by all possible destinations" (the job is intended to both compile and run). There is then no sign of the job in the queue, and no .leave file, so I guess this means it hasn't run!

Attachments (1)

xdofe000.xdofe.d09160.t081609.comp.leave (188.6 KB) - added by swr06rjk 11 years ago.
An example .comp.leave file

Download all attachments as: .zip

Change History (7)

Changed 11 years ago by swr06rjk

An example .comp.leave file

comment:1 Changed 11 years ago by willie

  • Owner changed from um_support to willie
  • Status changed from new to accepted

Hi Richard,

It might be that run lengths of 20 days + are now too long for HECToR is its current state. Try reducing the run length to 12hrs.

Regards,

Willie

comment:2 Changed 11 years ago by jeff

  • Owner changed from willie to jeff
  • Status changed from accepted to assigned

Hi

This seems to be a hector problem, it no longers accepts jobs of size 4 or 8 cores. I've sent a query to the hector helpdesk, I'll let you know what they say.

Jeff.

comment:3 Changed 11 years ago by jeff

Hi

Here is the response from the hector helpdesk

We have had to make some PBS configuration changes to accommodate the
move to quad core next week, which has lead to the problems with small
jobs that you are seeing. These problems will be addressed on the
move to quad core but we cannot make any changes while the upgrade is
taking place.

The option available at the moment is to request more processors for
this job.

So it looks like you will either have to wait until the quad core processors come online (should be Friday 19 June), or use 16 cores for your run instead of 4, is this possible?

Jeff.

comment:4 Changed 11 years ago by jeff

Hi

It looks like you can use 9 cores (3x3).

Jeff.

comment:5 Changed 11 years ago by swr06rjk

Thanks Jeff,

I've got a job now running on 16 cores, so I'm hopeful that's possible (and quicker!). I did have problems running on 16 when I first moved onto Hector, but maybe they've resolved themselves now.

Richard.

comment:6 Changed 11 years ago by lois

  • Resolution set to fixed
  • Status changed from assigned to closed
Note: See TracTickets for help on using tickets.