Opened 4 years ago

Closed 4 years ago

#1923 closed help (fixed)

CRUN not submitting qsub error

Reported by: ucfaako Owned by: ros
Component: UM Model Keywords: qsub, disk space
Cc: Platform: ARCHER
UM Version: 6.6.3

Description

Hi,

This morning my CRUN (xmfck) stopped after 1 hour although it should have been running for another 16 hours (2 model years). It worked fine (i.e. automatically resumbitted) previously. I've received an email that my run has finished, but my .archive file is empty. I then took the opportunity to change the resubmission to 1 model year. But the model would fail on SUBMIT with

qsub: script file:: No such file or directory
MAIN_SCR: Submit failed

I re-recompiled it (incl. changing NRUN to CRUN etc) but it still won't submit.

Is that down to the instability ARCHER's experiencing at the moment or might there be a more complex problem?

Many thanks,
Alex

Change History (6)

comment:1 Changed 4 years ago by ros

  • Keywords disk space added; CRUN removed
  • Owner changed from um_support to ros
  • Status changed from new to accepted

Hi Alex,

The error message above and empty files is an indicator of running out of disk space. You've hit your quota on /home. It is set low so I have just increased it for you - this will take a little while to be activated. You can see when it's increased by logging into ARCHER safe website. Please tidy up any non-required files on /home.

Regards,
Ros.

comment:2 Changed 4 years ago by ucfaako

Hi Ros,

Sorry I hadn't checked my quota in a while and things accumulated! Thank you very much for increasing it, I also deleted files not needed. Job submits successfully now.

Thanks,
Alex

comment:3 Changed 4 years ago by ucfaako

Hi Ros,

There still seems to be a problem with disk quota. I've deleted all ancillaries etc not needed so there should be enough space. The error message:

BUFFIN: Read Failed: No such file or directory

BUFFIN: Read Failed: No such file or directory

BUFFIN: Read Failed: No such file or directory

BUFFIN: Read Failed: No such file or directory

BUFFOUT: Write Failed: Disk quota exceeded

The .archive file is too large to be attached but can be found at /home/n02/n02/akncas/um/umui_out/xmfck000.xmfck.d16209.t094709.archive.

Many thanks,
Alex

comment:4 Changed 4 years ago by ros

Hi Alex,

You have generated 82Gb work of data for this run so far and have hit your quota on /work now. I have increased your disk quota on /work which will take a little while to take effect.

Please keep an eye on the data accrual, your job does have automatic deletion of superceded files switched on so it should tidy up after itself.

Regards,
Ros.

comment:5 Changed 4 years ago by ucfaako

Hi Ros,

Thank you, that explains a lot. xmfck/datam seems to contain a lot of output despite having turned on automatic archiving. Are these from my "failed" runs and safe to delete?

Many thanks,
Alex

comment:6 Changed 4 years ago by ros

  • Resolution set to fixed
  • Status changed from accepted to closed

Hi Alex,

Sorry I didn't get back to you. You should be able to tell if files are from older runs by the timestamps and also by the model date that the files relate to.

Regards,
Ros.

Note: See TracTickets for help on using tickets.