Opened 12 years ago

Closed 12 years ago

#160 closed help (fixed)

Automatic job resubmission failed on n02-ncas

Reported by: alexrap Owned by: um_support
Component: UM Model Keywords:
Cc: Platform:
UM Version:

Description

I had 2 atmosphere only jobs (xdgmg and xdgmr) running on the HPCx's 'parn16_6' queue in one months chunks. One of them (xdgmg) hasn't been doing anything since 13 aug and it is still queueing at the moment:

l1f402.388659.0 alexrap 8/13 05:13 S 50 parn16_6

and the other one (xdgmr) run until 16 aug when after finishing one of the chunks was not automaticaly re-submitted (although it should have been). The last lines of its '.leave. file are:

Job resubmitting itself…
llsubmit: Processed command file through Submit Filter: "~loadl/filter.pl".

No time in budget n02-ncas

llsubmit: 2512-081 Account number "n02-ncas" is not valid for user "alexrap".
llsubmit: 2512-051 This job has not been submitted to LoadLeveler?.

Job ended at : Sat Aug 16 12:17:55 BST 2008


Could you please advice me what to do with both of these 2 jobs?

Thanks,
Alex.

Change History (1)

comment:1 Changed 12 years ago by lois

  • Resolution set to fixed
  • Status changed from new to closed

Hello Alex,

HPCx have said on their news of the day banner (when you log in) that they are having problems


18/8/08 - Following the maintenance session last week, we are experiencing

some issues with the job submission, users may notice that some jobs are
going directly into 'System Hold', We are aware of the problem and are
working with IBM towards a solution.


Several people are experiencing these problems, which will hopefully solved soon.

HPCx are also having allocation problems where the allocation, which for n02-ncas on HPCx is un-limited, suddenly becomes overdrawn. HPCx are aware of these problems and they are seeking solutions.

Lois

Note: See TracTickets for help on using tickets.