Opened 20 months ago
Closed 20 months ago
#2952 closed help (fixed)
Suite's failing on submit-retrying; file not found
Reported by: | ChrisWells | Owned by: | ros |
---|---|---|---|
Component: | UM Model | Keywords: | |
Cc: | Platform: | ||
UM Version: |
Description
Hi,
I have a few suites running (e.g. u-bh765), and they've all failed on submit-retrying with errors like
ERROR: file not found
and the file is the job.err file in cylc-run.
I think I remember having this error before but can't remember how it was resolved. I've tried stopping and rose suite-run —restart , but the error persists.
Do you know how I can get round this?
Cheers,
Chris
Change History (6)
comment:1 Changed 20 months ago by ros
- Owner changed from um_support to ros
- Status changed from new to accepted
comment:2 Changed 20 months ago by ChrisWells
Hi Ros,
Many thanks for the info. I shouldn't be registered on project slpec; I should be on ukca-imp. I think this is an issue which arose a while ago when I asked for permission to access slpec data on MASS. Do you know how I can get re-registered solely on ukca-imp?
Cheers,
Chris
comment:3 Changed 20 months ago by ros
Hi Chris,
If you no longer need access to anything related to the project slpec, including data under /projects/slpec then you can ask the Monsoon team to remove you from that project.
Otherwise you just need to specify which project you want a suite to run under. In "suite conf → Project Accounting" set:
Use default account: false
Account: other
Other user account: ukca-imp.
The suite will then run under ukca-imp
Cheers,
Ros.
comment:4 Changed 20 months ago by ChrisWells
Hi Ros,
Thanks for the info. If I change my DATADIR to be my ukca-imp user in my .bashrc, and continue my runs, will that cause issues for the suites I have currently running?
Cheers,
Chris
comment:5 Changed 20 months ago by ros
Hi Chris,
Yes it will cause problems, as it will cause cylc to replace the current symbolic link in /home from slpec to ukca-imp and then won't have any of the files the suite needs.
You can just simply restart the suite under the ukca-imp account and continue writing the output to /projects/slpec.
If you want to change the data to going under /projects/ukca-imp then you will need to stop the suite, change the start date, start dump etc and start a new run. To get the data under /projects/ukca-imp you can try setting $DATADIR in you .bashrc/.bash_profile but this doesn't always work. I not you will need to the add the following 2 lines to the top of the rose-suite.conf file:
root-dir{share}=*=/projects/ukca-imp/$USER root-dir{work}=*=/projects/ukca-imp/$USER
CHeers,
ros.
comment:6 Changed 20 months ago by ChrisWells
- Resolution set to fixed
- Status changed from accepted to closed
Hi Ros,
Thanks - I've restarted them under ukca-imp going to slpec.
I'll close this now.
Cheers,
Chris
Hi Chris,
The tasks have not submitted so there will be no job.out or job.err files. When tasks fail with submit-retrying you need to look in the job-activity.log file for the error. The error is:
qsub: error: [PBSInvalidProject] 'slpec' has no share allocation in 'collaboration' trustzone
This indicates that 'slpec' has no time allocation on Monsoon and so your jobs will not run. You will need to talk to your project PI and/or the Monsoon team.
Regards,
Ros.