Opened 10 years ago

Closed 10 years ago

#372 closed help (fixed)

qsub: Job rejected by all possible destinations

Reported by: richie Owned by: jeff
Component: UM Model Keywords:
Cc: Platform:
UM Version: <select version>

Description

Hi
I have a run on Hector xdkzj. Which runs very nicely when I put the first run of a series on (run - nrun) but gives me a job rejected error when I change nrun to crun. Is there some simple procedure I am missing here?

My output from the nrun is in

/home/n02/n02/richie/um/umui_out

and as far as I can see has read permissions.
The nrun submit file is in
/home/n02/n02/richie/umui_runs/xdkzj-007120720
while the failing crun is in

/home/n02/n02/richie/umui_runs/xdkzj-015121533

Thanks for you time,

Rich

Change History (3)

comment:1 Changed 10 years ago by jeff

  • Owner changed from um_support to jeff
  • Status changed from new to accepted

Hi Rich

Your NRUN crashed after 30 days with this error

[0] MPICH PtlEQPoll error (PTL_EQ_DROPPED): An event was dropped on the UNEX EQ handle.  Try increasing the value of env var MPICH_PTL_UNEX_EVENTS (cur size is 240000).
aborting job:
PtlEQPoll/PtlEQGet error
[NID 338]Apid 1519011: initiated application termination

Your first CRUN job ran out of CPU time. In the umui job you have asked for 108000 seconds which is 30 hours, there is not a queue of this length which is why your job failed to submit. The maximum queue length is 12 hours (43200 secs).

I noticed in your job you have not included this mod $PUM_MODS61/hector_io.mf77, I would recompile your code with this mod as it might fix the first problem and should also run faster.

Jeff.

comment:2 Changed 10 years ago by richie

Hi Jeff,
Thanks for that. I have added the mod. As for the overly long time, I failed to see the last zero. I thought I was asking for 3 hours, which as it turns out probably wasn't enough. I have set it for 6 hours and reduced the expected run time from 6 months to 3 months.
Sorry to burden you with trivial questions,
Rich

comment:3 Changed 10 years ago by jeff

  • Resolution set to fixed
  • Status changed from accepted to closed
Note: See TracTickets for help on using tickets.