Opened 5 months ago

Closed 4 months ago

#3076 closed help (fixed)

I can't submit new jobs

Reported by: luciad Owned by: um_support
Component: PUMA Keywords: fail submission
Cc: Platform: ARCHER
UM Version: 11.1

Description

Hello,

I am trying to run a suite on pumatest, with short queue on ARCHER.
Since yesterday all my jobs have failed at atmos-main with a submission error.

Now I tried to submit a job again, but it fails after I type rose suite-run, with the following message:

[FAIL] ssh -oBatchMode=yes -n login2.archer.ac.uk env\ ROSE_VERSION=2019.01.0\ CYLC_VERSION=7.8.1\ bash\ -l\ -c\ \'\"$0\"\ \"$@\"\'\ rose\ suite-run\ -vv\ -n\ u-bo228\ —run=run\ —remote=uuid=c45b1956-57f5-42fc-a39e-e5ab15060da8,now-str=20191114T145133Z,root-dir=\'$DATADIR\' # return-code=247, stderr=

[FAIL] 2019-11-14T14:54:06+0000 cylc get-global-config -i [hosts][localhost]run\ directory # return-code=-9

Could you help me with this?

Best regards,
Lucia

Change History (9)

comment:1 Changed 5 months ago by dcase

The short queue is for jobs with less than 8 nodes and less than 20 minutes, which may affect you. See details: https://www.archer.ac.uk/documentation/user-guide/batch.php#sec-5.14

As for your ssh issues, if you type ssh login2.archer.ac.uk do you log straight into ARCHER without being prompted for a password?

comment:2 Changed 5 months ago by luciad

For your first comment, i don't think my suite can run in 20 minutes. So I will probably go back to the standard queue.

As for your ssh issues, if you type ssh login2.archer.ac.uk do you log straight into ARCHER without being prompted for a password?

Yes, I can log in directly to archer.

Lucia

comment:3 Changed 5 months ago by dcase

I've read the rose-suite-run.log : it looks like you are stuck on a different stage. Is this correct?

If you are stuck on an rsync to ARCHER, you may have an open file blocking things. If this is the start of a run, can you go to archer and rm -r cylc-run/u-bo228 ?

comment:4 Changed 5 months ago by luciad

I've read the rose-suite-run.log : it looks like you are stuck on a different stage

What stage? My suite hasn't been able to submit any job yesterday. It only compiled, but then failed to submit the atmos-main.

I've removed the cylc-run/u-bo228 from archer, and tried to run the suite again from pumatest. It still ives me the same error:

[FAIL] ssh -oBatchMode=yes -n login6.archer.ac.uk env\ ROSE_VERSION=2019.01.0\ CYLC_VERSION=7.8.1\ bash\ -l\ -c\ \'\"$0\"\ \"$@\"\'\ rose\ suite-run\ -vv\ -n\ u-bo228\ —run=run\ —remote=uuid=20aba2ed-07b4-4b29-8f67-b476036e695d,now-str=20191114T164510Z,root-dir=\'$DATADIR\' # return-code=247, stderr=
[FAIL] ————————————————————————————————————————
[FAIL] This is a private computing facility. Access to this service is limited to those
[FAIL] who have been granted access by the operating service provider on behalf of the
[FAIL] contracting authority and use is restricted to the purposes for which access was
[FAIL] granted. All access and usage are governed by the terms and conditions of access
[FAIL] agreed to by all registered users and are thus subject to the provisions of the
[FAIL] Computer Misuse Act, 1990 under which unauthorised use is a criminal offence.
[FAIL]
[FAIL] If you are not authorised to use this service you must disconnect immediately.
[FAIL] ————————————————————————————————————————
[FAIL]
[FAIL] [FAIL] 2019-11-14T16:47:21+0000 cylc get-global-config -i [hosts][localhost]run\ directory # return-code=-9

comment:5 Changed 5 months ago by dcase

When you said that you can ssh into ARCHER, was it using exactly the command I said, i.e. without adding your username?

If you type a similar one, like ssh -n login6.archer.ac.uk ls do you get a list of directories from ARCHER?

comment:6 Changed 5 months ago by luciad

Yes, I loged in without using my username or password.

Indeed, if I type this last command, I get the list of my directories on archer

comment:7 Changed 5 months ago by dcase

When you run the suite, could you try rose suite-run --new

There's a small chance that this may clear up a problem

comment:8 Changed 5 months ago by luciad

The suite is running now, after doing rose suite-run —new.
I would like to keep this ticket open a while longer, to see if there is another failed submission later on in the process.

comment:9 Changed 4 months ago by ros

  • Resolution set to fixed
  • Status changed from new to closed

Closing ticket - 4 weeks have passed since last comment, assuming this is all working now.

Note: See TracTickets for help on using tickets.