Version 8 (modified by annette, 5 years ago) (diff)

Useful information for running with Rose


To switch versions of Rose and/or cylc

  • On puma and MONSooN export the variables: CYLC_VERSION=x.y.z and ROSE_VERSION=YYYY.MM.DD
  • On Archer use module switch command.


Unable to submit jobs

The suite will fail straight away and the following error appears in the log/suite/err file:

Host key verification failed.
2015-01-21T14:56:23Z ERROR - [fcm_make.1] -Failed to construct job submission command
2015-01-21T14:56:23Z WARNING - Command '['ssh', '-oBatchMode=yes', '-oConnectTimeout=10', 'exvmsrose', 'mkdir -p "$HOME/cylc-run/nemovar_build" "$HOME/cylc-run/nemovar_build/lo
g/job"']' returned non-zero exit status 255
2015-01-21T14:56:23Z ERROR - [fcm_make.1] -submission failed 

This is because of an inability to ssh into the Rose VM from the Cylc VM interactively.

To solve, log in to the Cylc VM and then back to the Rose VM specifying the full paths, to add these to the known_hosts file.

  1. Check whether exvmscylc or exvmsrose appear in the known_hosts file already. If so delete these entries, especially if you accessed the VMs before their rebuild:
    cd .ssh
    mv known_hosts known_hosts.OLD
    sed '/^exvmsrose/d;/exvmscylc/d' known_hosts.OLD > known_hosts
  1. Now from exvmsrose, ssh into exvmscylc using the full path:
    This should provide output something like this:
    The authenticity of host ' (' can't be established.
    RSA key fingerprint is 98:c8:5e:b9:b3:d2:2f:c4:9c:89:78:08:d6:78:70:3a.
    Are you sure you want to continue connecting (yes/no)? 
    Type yes.
  1. Now from exvmscylc, log in to exvmsrose using the full path:
    And again type yes at the prompt.
  1. Type exit to get back to the Rose VM, then ssh into exvmsrose again, and this should succeed without any interative prompts.
  1. Now type exit twice to get back to the original Rose terminal. And try re-submitting the rose suite.

No gcylc window

When submitting a job, no gcylc window appears.

Sometimes the gui is slow to load. If it does not appear at all however, check that you have X11 forwarding set up from your initial location and the lander.

To do so ssh with the -Y option or alternatively, append the following line to your ~/.ssh/config file:

Host *
ForwardX11 yes

Rose suite running but can't shutdown

A rose suite is supposedly running, i.e. rose suite-scan gives something like:

puma-aa046 gmslis@exvmscylc:7767 

Or trying to re-run the suite gives an error rose suite-run

[FAIL] Suite "puma-aa046" may still be running.
[FAIL] Host "exvmscylc" has process:
[FAIL]     9468 python /home/fcm/cylc-6.1.2/bin/cylc-run puma-aa046
[FAIL]     9469 python /home/fcm/cylc-6.1.2/bin/cylc-run puma-aa046
[FAIL] Try "rose suite-shutdown --name=puma-aa046" first? 

However, when trying to shutdown the suite, rose suite-stop reports that the suite isn't running:

Really shutdown puma-aa046 at exvmscylc? [y/n] y
'ERROR, remote port file not found' 

This is due to orphaned tasks on the Cylc VM, which can occur when exvmscylc and exvmsrose cannot communicate non-interactively.

To solve, log in to exvmscylc, and run cylc scan, this should show running tasks. To stop these, type:

cylc shutdown --now

This may report something like "Command queued", but re-running cylc scan will show that the tasks are now finished.