Opened 5 years ago

Closed 5 years ago

#1345 closed help (fixed)

Job running out of CPU time

Reported by: laurahb Owned by: um_support
Component: UM Model Keywords: coupled model, NEMO, hanging, time out
Cc: Platform: MONSooN
UM Version: 8.2

Description

I'm trying to run a job that keeps timing out, despite having doubled the requested wallclock time to 12000s (though I think it's capped automatically at 10800s). The job-id is xixav, and I'm running it on Monsoon. This is a copy of xixak, which ran fine a few months ago with only 6000s wallclock time requested. The only thing I've changed is the CO2 concentration - they are otherwise exactly the same. I've also tried running with the CO2 conc. set back to the original value in case that was upsetting it, but still had the same problem.
I don't know if it's getting stuck somewhere indefinitely, or whether something on Monsoon has changed which means it runs more slowly.

Change History (3)

comment:1 Changed 5 years ago by willie

Hi Laura,

It is not even processing one time step. I could not find the start dump

/nerc/lhbake/start_dumps/xixako_20851201_restart.nc

I also note that at the tail end of xixak, there was a floating point exception in the UKCA code, but the atmosphere start dump seems ok - I checked it by cumf'ing it with itself.

The parallel class on MONSooN is indeed limited to 3 hours, but even with the extended run time I don't think you should be getting near that.

I hope that helps.

Regards

Willie

comment:2 Changed 5 years ago by laurahb

Thanks - that explains it! There was an error in the path to that nemo dump (should be /nerc/slpec/lhbake/…). Normally the .leave file is good at telling you if it's failed because it can't find a file, which is why I hadn't found this before, as I assumed it was a different problem.
Hopefully it will be fine now, though I'll have to wait until Monsoon is back up to test it.

comment:3 Changed 5 years ago by annette

  • Keywords coupled model, NEMO, hanging, time out added
  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.