problem of "submit-failed"

Dear CMS helpdesk,

I have a problem in submitting my job suite to Archer from last Friday. It stops when the suite starts to compile the model. It shows that "submit-failed" on GUI. However, there isn't any error information.

I have stopped the suite and tried to resubmit for several times. It stops at the step to compile the model most of the time. Sometimes, it succeed compiling the model, but failed to submit without providing any useful information.

Please does someone know what the problem is? I have to make it run today. Thank you very much!

By the way, my suite is u-ax173, which I upgrade from a UM10.9 suite to run UM11.0.

Best regards


comment:1 Changed 21 months ago by ros

Hi Jian-Feng,

In the suite error file (log/suite/err) the suite is having problems connecting to ARCHER with rose host-select sometimes it's not being able to find an ok login node for some reason.

Please can you try running rose host-select on the PUMA command line. If it lists failed logins please try "ssh"ing to the failed nodes (e.g. ssh <username>@login3.archer.ac.uk) and follow any instructions.

If the suite still fails to submit with the same error message, then the easiest thing to do is replace

host = $(rose host-select {{ HPC_HOST }})


host = login.archer.ac.uk

in the suite.rc file


comment:2 Changed 21 months ago by jfgu

Hi Ros,

I tried running rose host-select on PUMA, but it fails with

[FAIL] No (default) hosts specified.

Then I just replace host = $(rose host-select {{ HPC_HOST }}) with host = login.archer.ac.uk, and it succeed in compiling the model. But again, it fails to submit when starting the reconfiguration.The suite error file says:

2018-04-23T09:46:09Z ERROR - [job-submit cmd] cylc jobs-submit --host=login.archer.ac.uk --remote-mode -- '$HOME/cylc-run/u-ax173/log/job' 10000101T0000Z/UM_recon/01
        [job-submit ret_code] 191
        [job-submit out] 2018-04-23T10:46:08+01|10000101T0000Z/UM_recon/01|191|None
2018-04-23T09:46:09Z ERROR - [UM_recon.10000101T0000Z] -submission failed

The host is correct. I am not sure what's the problem now?


comment:3 Changed 21 months ago by ros

Sorry I missed the host off - should have been.

rose host-select archer

The full error message is in log/job/10000101T0000Z/UM_recon/01/job-activity.log

You are trying to submit it to the short queue but are requesting more than 20minutes which is not allowed. You will either need to change the queue to be standard or change the wall clock to be 20minutes or less.


comment:4 Changed 21 months ago by ros

