Opened 4 months ago

Closed 4 months ago

#3408 closed help (fixed)

Tasks getting submit-failed error

Reported by: dgalea Owned by: um_support
Component: Rose/Cylc Keywords: rose host-select
Cc: Platform: PUMA
UM Version: 11.0

Description

Hi,

I stopped a suite (u-by690) during its run as I figured out that there was something wrong. After fixing it, I came to start running it again but fcm_make2_um, fcm_make2_pp, fcm_make2_pptransfer and install_ancil are giving out a submit-failed error.

I have checked that my ssh-agent is started and that my ARCHER key is loaded by running ssh-add -l which returned

4096 SHA256:oBxjIzLzL0tx7s66r039SHoBUzjKp3a5O6nBvnX4Eyg id_rsa_archerum (RSA)

I have confirmed that I can ssh from PUMA to ARCHER via ssh login.archer.ac.uk which gets

--------------------------------------------------------------------------------
This is a private computing facility. Access to this service is limited to those
who have been granted access by the operating service provider on behalf of the
contracting authority and use is restricted to the purposes for which access was
granted. All access and usage are governed by the terms and conditions of access
agreed to by all registered users and are thus subject to the provisions of the
Computer Misuse Act, 1990 under which unauthorised use is a criminal offence.

If you are not authorised to use this service you must disconnect immediately.
--------------------------------------------------------------------------------

PTY allocation request failed on channel 0
Comand rejected by policy. Not in authorised list 
Connection to login.archer.ac.uk closed.

I have then run rose host-select archer which gets

[WARN] login.archer.ac.uk: (ssh failed)
[WARN] login2.archer.ac.uk: (ssh failed)
[WARN] login3.archer.ac.uk: (ssh failed)
[WARN] login6.archer.ac.uk: (timed out)
[WARN] login5.archer.ac.uk: (ssh failed)
[WARN] login4.archer.ac.uk: (ssh failed)
[WARN] login7.archer.ac.uk: (ssh failed)
[WARN] login1.archer.ac.uk: (ssh failed)
[WARN] login8.archer.ac.uk: (ssh failed)
[FAIL] No hosts selected.

Therefore, I think that there is something going on with rose, but not sure what it is and how to fix it. Would you be able to help me out?

Thanks.

Change History (6)

comment:1 Changed 4 months ago by ros

Hi Daniel,

rose host-select archer doesn't work anymore since the move of ARCHER to 2FA. The suite host is set to login.archer.ac.uk and the suite has submitted to ARCHER ok. In the job.err files on ARCHER there is the error:

[FAIL] fcm_make2_um (key=fcm_make_um): task has no associated application.
Received signal ERR

When I run your suite I get the same error, however my other suites work fine so I surmise there is something wrong with your suite setup, but I can't see immediately what. What have you changed since it last ran ok?

Regards,
Ros.

comment:2 Changed 4 months ago by dgalea

Hi Ros,

I only changed the rev number in the source field. I have tried to run my other suite, which had been running fine, but it too fails with the same error. No changes were made to this second suite. A copy of the training suite also fails.

Regards,
Daniel

Last edited 4 months ago by dgalea (previous) (diff)

comment:3 Changed 4 months ago by dgalea

Hi,

just checking to see if you've had time to look at my issue.

Regards,
Daniel

comment:4 Changed 4 months ago by ros

Hi Daniel,

Please try again now. I think a Rose configuration line added to enable submission to ARCHER2 is for some reason interfering with ARCHER submission. I've rolled back the change for now.

Sorry for the inconvenience.
Regards,
Ros.

comment:5 Changed 4 months ago by dgalea

Hi,

all seems to be working now. Thanks for your help.

Regards,
Daniel

comment:6 Changed 4 months ago by ros

  • Keywords rose host-select added
  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.