Opened 17 months ago
Closed 16 months ago
#3076 closed help (fixed)
I can't submit new jobs
Reported by: | luciad | Owned by: | um_support |
---|---|---|---|
Component: | PUMA | Keywords: | fail submission |
Cc: | Platform: | ARCHER | |
UM Version: | 11.1 |
Description
Hello,
I am trying to run a suite on pumatest, with short queue on ARCHER.
Since yesterday all my jobs have failed at atmos-main with a submission error.
Now I tried to submit a job again, but it fails after I type rose suite-run, with the following message:
[FAIL] ssh -oBatchMode=yes -n login2.archer.ac.uk env\ ROSE_VERSION=2019.01.0\ CYLC_VERSION=7.8.1\ bash\ -l\ -c\ \'\"$0\"\ \"$@\"\'\ rose\ suite-run\ -vv\ -n\ u-bo228\ —run=run\ —remote=uuid=c45b1956-57f5-42fc-a39e-e5ab15060da8,now-str=20191114T145133Z,root-dir=\'$DATADIR\' # return-code=247, stderr=
[FAIL] 2019-11-14T14:54:06+0000 cylc get-global-config -i [hosts][localhost]run\ directory # return-code=-9
Could you help me with this?
Best regards,
Lucia
Change History (9)
comment:1 Changed 17 months ago by dcase
comment:2 Changed 17 months ago by luciad
For your first comment, i don't think my suite can run in 20 minutes. So I will probably go back to the standard queue.
As for your ssh issues, if you type ssh login2.archer.ac.uk do you log straight into ARCHER without being prompted for a password?
Yes, I can log in directly to archer.
Lucia
comment:3 Changed 17 months ago by dcase
I've read the rose-suite-run.log : it looks like you are stuck on a different stage. Is this correct?
If you are stuck on an rsync to ARCHER, you may have an open file blocking things. If this is the start of a run, can you go to archer and rm -r cylc-run/u-bo228 ?
comment:4 Changed 17 months ago by luciad
I've read the rose-suite-run.log : it looks like you are stuck on a different stage
What stage? My suite hasn't been able to submit any job yesterday. It only compiled, but then failed to submit the atmos-main.
I've removed the cylc-run/u-bo228 from archer, and tried to run the suite again from pumatest. It still ives me the same error:
[FAIL] ssh -oBatchMode=yes -n login6.archer.ac.uk env\ ROSE_VERSION=2019.01.0\ CYLC_VERSION=7.8.1\ bash\ -l\ -c\ \'\"$0\"\ \"$@\"\'\ rose\ suite-run\ -vv\ -n\ u-bo228\ —run=run\ —remote=uuid=20aba2ed-07b4-4b29-8f67-b476036e695d,now-str=20191114T164510Z,root-dir=\'$DATADIR\' # return-code=247, stderr=
[FAIL] ————————————————————————————————————————
[FAIL] This is a private computing facility. Access to this service is limited to those
[FAIL] who have been granted access by the operating service provider on behalf of the
[FAIL] contracting authority and use is restricted to the purposes for which access was
[FAIL] granted. All access and usage are governed by the terms and conditions of access
[FAIL] agreed to by all registered users and are thus subject to the provisions of the
[FAIL] Computer Misuse Act, 1990 under which unauthorised use is a criminal offence.
[FAIL]
[FAIL] If you are not authorised to use this service you must disconnect immediately.
[FAIL] ————————————————————————————————————————
[FAIL]
[FAIL] [FAIL] 2019-11-14T16:47:21+0000 cylc get-global-config -i [hosts][localhost]run\ directory # return-code=-9
comment:5 Changed 17 months ago by dcase
When you said that you can ssh into ARCHER, was it using exactly the command I said, i.e. without adding your username?
If you type a similar one, like ssh -n login6.archer.ac.uk ls do you get a list of directories from ARCHER?
comment:6 Changed 17 months ago by luciad
Yes, I loged in without using my username or password.
Indeed, if I type this last command, I get the list of my directories on archer
comment:7 Changed 17 months ago by dcase
When you run the suite, could you try rose suite-run --new
There's a small chance that this may clear up a problem
comment:8 Changed 17 months ago by luciad
The suite is running now, after doing rose suite-run —new.
I would like to keep this ticket open a while longer, to see if there is another failed submission later on in the process.
comment:9 Changed 16 months ago by ros
- Resolution set to fixed
- Status changed from new to closed
Closing ticket - 4 weeks have passed since last comment, assuming this is all working now.
The short queue is for jobs with less than 8 nodes and less than 20 minutes, which may affect you. See details: https://www.archer.ac.uk/documentation/user-guide/batch.php#sec-5.14
As for your ssh issues, if you type ssh login2.archer.ac.uk do you log straight into ARCHER without being prompted for a password?