#2250 closed help (fixed)

Job stuck in queue on Jasmin

Reported by: charlie Owned by: um_support
Component: JULES Keywords:
Cc: Platform: Other
UM Version: <select version>

Description

Hi,

I recently contacted the JASMIN helpdesk, asking why one of my jobs (which is a JULES suite) is taking so long in the queue. I am requesting a runtime of 4 hours only using the highest priority queue (short-serial), and having done this in the past it usually takes up to 24 hours in the queue. However, on this occasion, I submitted it at 1305 hours on Friday 11 August. We are now Friday 18 August, and it is still queueing!!

I am surprised by this, because as I said I have done this several times over the past few weeks, and it has always run within a few hours, up to 24 hours. Not 7 days in the queue, like this time. I haven't done anything different.

Their response was that my job's requirement for exclusive execution #BSUB -x is not satisfied for LSF to allocate one CPU core on a host with 16 CPU cores free. They said that if my job is of a short execution time and it needs to run on a single CPU core, then I can submit this job to the short-serial queue but only after deleting/commenting out the exclusive execution mode in my job script file, at /home/users/cwilliams2011/cylc-run/u-am232/log/job/1/jules/01/job.

However, what they don't realise is that I am currently submitting my job from PUMA, using the Rose/Cycl? infrastructure. So I am not submitting my job directly on JASMIN. I submit my job from PUMA, which then copies all the relevant files to JASMIN and submits automatically to the queue.

So when should I change this job file? It is not generated on PUMA until I run the suite, which is then automatically copied to JASMIN and submitted. So is there a way of stopping this, changing the job file on JASMIN as they suggest, then directly resubmitting?

Charlie

Change History (3)

comment:1 Changed 23 months ago by charlie

Hi again,

Okay, I have now resolved this issue, with the help of the Jasmin helpdesk who got back to me with more specific instructions.

The way to get round this issue is to submit the job as usual, in order to generate a jobID (need to wait until fcm_make has succeeded, and jules is queueing. Then modify the job using bmod, to override the -x option as specified in the BSUB directive (within the job script file). Specifically,

bmod -xn jobID

This, effectively, does the same as removing/commenting out the line: #BSUB -x within the job script file.

It should then ran straight away - or rather the job will be in the queue until LSF allocates a single CPU core on a shared host i.e. it is now in non-exclusive execution mode.

Just a quick thought, and question: instead of doing the above (i.e. submitting then modifying), could I achieve the same thing by simply removing the -x directive within my suite.rc before submission i.e. removing the last line below?

        [[[directives]]]
            -q = short-serial
            -W = 4:00
            -n = {{ MPI_NUM_TASKS * OMP_NUM_THREADS }}
            -x =

Charlie

Last edited 22 months ago by ros (previous) (diff)

comment:2 Changed 22 months ago by ros

Hi Charlie,

Yes, this is where -x is set so you should be able to modify it, as required, in the suite.rc file . Try it and see.

Cheers,
Ros.

comment:3 Changed 22 months ago by willie

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.