#2952 closed help (fixed)

Suite's failing on submit-retrying; file not found

Reported by: ChrisWells Owned by: ros
Component: UM Model Keywords:
Cc: Platform:
UM Version:

Description

Hi,

I have a few suites running (e.g. u-bh765), and they've all failed on submit-retrying with errors like

ERROR: file not found

and the file is the job.err file in cylc-run.

I think I remember having this error before but can't remember how it was resolved. I've tried stopping and rose suite-run —restart , but the error persists.

Do you know how I can get round this?

Cheers,
Chris

Change History (6)

comment:1 Changed 12 months ago by ros

  • Owner changed from um_support to ros
  • Status changed from new to accepted

Hi Chris,

The tasks have not submitted so there will be no job.out or job.err files. When tasks fail with submit-retrying you need to look in the job-activity.log file for the error. The error is:

qsub: error: [PBSInvalidProject] 'slpec' has no share allocation in 'collaboration' trustzone

This indicates that 'slpec' has no time allocation on Monsoon and so your jobs will not run. You will need to talk to your project PI and/or the Monsoon team.

Regards,
Ros.

comment:2 Changed 12 months ago by ChrisWells

Hi Ros,

Many thanks for the info. I shouldn't be registered on project slpec; I should be on ukca-imp. I think this is an issue which arose a while ago when I asked for permission to access slpec data on MASS. Do you know how I can get re-registered solely on ukca-imp?

Cheers,
Chris

comment:3 Changed 12 months ago by ros

Hi Chris,

If you no longer need access to anything related to the project slpec, including data under /projects/slpec then you can ask the Monsoon team to remove you from that project.

Otherwise you just need to specify which project you want a suite to run under. In "suite conf → Project Accounting" set:

Use default account: false
Account: other
Other user account: ukca-imp.

The suite will then run under ukca-imp

Cheers,
Ros.

comment:4 Changed 12 months ago by ChrisWells

Hi Ros,

Thanks for the info. If I change my DATADIR to be my ukca-imp user in my .bashrc, and continue my runs, will that cause issues for the suites I have currently running?

Cheers,
Chris

comment:5 Changed 12 months ago by ros

Hi Chris,

Yes it will cause problems, as it will cause cylc to replace the current symbolic link in /home from slpec to ukca-imp and then won't have any of the files the suite needs.

You can just simply restart the suite under the ukca-imp account and continue writing the output to /projects/slpec.

If you want to change the data to going under /projects/ukca-imp then you will need to stop the suite, change the start date, start dump etc and start a new run. To get the data under /projects/ukca-imp you can try setting $DATADIR in you .bashrc/.bash_profile but this doesn't always work. I not you will need to the add the following 2 lines to the top of the rose-suite.conf file:

root-dir{share}=*=/projects/ukca-imp/$USER
root-dir{work}=*=/projects/ukca-imp/$USER

See https://collab.metoffice.gov.uk/twiki/bin/view/Support/MONSooNRose#Changing%20which%20/projects%20directo

CHeers,
ros.

comment:6 Changed 12 months ago by ChrisWells

  • Resolution set to fixed
  • Status changed from accepted to closed

Hi Ros,

Thanks - I've restarted them under ukca-imp going to slpec.

I'll close this now.

Cheers,
Chris

Note: See TracTickets for help on using tickets.