Opened 6 months ago
Closed 6 months ago
#3359 closed help (answered)
UM runs stuck on ARCHER
Reported by: | pmcguire | Owned by: | um_support |
---|---|---|---|
Component: | UM Model | Keywords: | UM |
Cc: | Platform: | ARCHER | |
UM Version: | 11.5 |
Description
Hi CMS Helpdesk
My UM runs seem to be stuck on ARCHER. I have 3 runs going.
For one of them, u-bw963, it appeared in the Cylc GUI that my atmos_main run for 1992 had finished 2 days ago, but the icon was still green like it was running. The log file said it had finished. So I changed the state manually from running to succeeded. That didn't help much. It tried to submit posproc, but the submit failed.
So I stopped the job, and then did a rose suite-run --restart, but I get an error:
[FAIL] ssh -oBatchMode=yes login7.archer.ac.uk bash —login -c \'ROSE_VERSION=2016.11.1\ rose\ suite-run\ -v\ -v\ —name=u-bw963\ —run=restart\ —remote=uuid=dc8f63a3-fe23-46ab-898d-4e76af47812d,root-dir=$DATADIR\' # return-code=255, stderr=
any suggestions?
Patrick
Change History (11)
comment:1 Changed 6 months ago by dcase
comment:2 Changed 6 months ago by pmcguire
I checked the disk space a couple of days ago.
For login7, we can't ssh from puma→login7 in normal times.
Patrick
comment:3 Changed 6 months ago by pmcguire
Yes, I just checked my disk space quota again on SAFE → ARCHER, and everything is fine there.
Patrick
comment:4 Changed 6 months ago by pmcguire
How do I check the ssh connection to login7, if we're not supposed to be able to ssh from puma→login7 anyways?
Patrick
comment:5 Changed 6 months ago by ros
Hi Patrick,
Just ssh from puma to login7.archer.ac.uk in the normal way. If your ssh isn't set up correctly you'll get a permission denied message. If it's ok you'll see the message "Command rejected - not on allowed list" or similar - I can't remember the exact message.
Cheers,
Ros.
comment:6 Changed 6 months ago by ros
P.S. You'll see error messages in log/job/err.
comment:7 Changed 6 months ago by pmcguire
Thanks, Ros:
These are the error messages that I get:
pmcguire@puma:~> ssh login7.archer.ac.uk
Enter passphrase for key '/home/pmcguire/.ssh/id_rsa_archerum':
PTY allocation request failed on channel 0
Comand rejected by policy. Not in authorised list
Connection to login7.archer.ac.uk closed.
comment:8 Changed 6 months ago by pmcguire
Hi Ros:
So I guess that means it's OK? (see the last comment).
The error message when I do a rose suite-run --restart with ssh -oBatchMode=yes login7.archer.ac.uk is 'Permission Denied'.
Patrick
comment:9 Changed 6 months ago by pmcguire
Hi Ros:
It seems to be working now.
Not sure exactly what I changed.
But I did restart my ssh agent for archerum .
But I was getting the message
PTY allocation request failed on channel 0 Comand rejected by policy. Not in authorised list
even before I restarted my ssh agent.
Patrick
comment:10 Changed 6 months ago by grenville
comment:11 Changed 6 months ago by grenville
- Resolution set to answered
- Status changed from new to closed
The first things that I'd check would be ssh connection and disk space (on both computers).
Presumably these are all ok?