Opened 3 months ago

Closed 2 months ago

#2848 closed help (fixed)

Job stuck at 'submitted' and having problems at 'postproc': u-bh050

Reported by: cwc46 Owned by: um_support
Component: UM Model Keywords:
Cc: Platform: ARCHER
UM Version: 11.0

Description

Dear Helpdesk,

I am having trouble with one of my runs - u-bh050 is stuck currently at 'submitted', and I am sometimes getting error with 'postproc' when I kill the job and try to restart.

I am unable to solve the problem, and in particular I am questioning why it is not creating the required directory on my ARCHER /work, which is probably causing it to fail at 'postproc'..

Thank you for your help!

Glen

Change History (4)

comment:1 Changed 3 months ago by ros

Hi Glen,

I don't know what has happened but you have 7 atoms_main jobs for u-bh050 sitting in the queue on Archer (qstat -u glenchua).

Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
6103300.sdb     glenchua standard atmos_main    --   15 360    --  01:40 Q   -- 
6103903.sdb     glenchua standard atmos_main    --   15 360    --  01:40 Q   -- 
6103912.sdb     glenchua standard atmos_main    --   15 360    --  01:40 Q   -- 
6103919.sdb     glenchua standard atmos_main    --   15 360    --  01:40 Q   -- 
6103923.sdb     glenchua standard atmos_main    --   15 360    --  01:40 Q   -- 
6103950.sdb     glenchua standard atmos_main    --   15 360    --  01:40 Q   -- 
6103953.sdb     glenchua standard atmos_main    --   15 360    --  01:40 Q   -- 

Looks like you have been retriggering the atoms_main tasks and thus submitting multiple instances? So the reason postproc is failing is because the atoms_main for the cycle has not run yet hence being in status submitted.

I would suggest stopping your suite (choosing the option to kill all running tasks). Then on ARCHER use qdel (e.g. qdel 6103953.sdb) l to kill all your queuing jobs in the list above.

Then restart the suite, retrigger the atoms_main task in the 19910801 cycle and change the status of the associated postproc task to waiting. That should get it back on track.

Regards,
Ros.

comment:2 Changed 3 months ago by ros

Hi Glen,

Overnight your suite has progressed fine the 6103953.sdb atmos_main finished fine and postproc has run successfully and the next 3 cycles have run.

Cheers,
Ros.

comment:3 Changed 3 months ago by cwc46

Dear Ros,

Ah, thank you very much for your help!

Best wishes,
Glen

comment:4 Changed 2 months ago by grenville

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.