Changes between Version 15 and Version 16 of RoseCylc/Hints


Ignore:
Timestamp:
20/02/15 15:33:16 (5 years ago)
Author:
annette
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • RoseCylc/Hints

    v15 v16  
    22 
    33= Useful information for running with Rose =  
    4  
    5 == [[span(style=color: blue, Tips)]] == 
    64 
    75=== To switch versions of Rose and/or cylc === 
     
    3230 
    3331For details of further customisations that can be made to the `rose edit` window see: http://metomi.github.io/rose/doc/rose-rug-config-edit.html#customisation 
    34  
    35 == [[span(style=color: blue, Troubleshooting)]] ==  
    36  
    37 === Unable to submit jobs ===  
    38  
    39 The suite will fail straight away and the following error appears in the {{{log/suite/err}}} file:  
    40 {{{ 
    41 Host key verification failed. 
    42 2015-01-21T14:56:23Z ERROR - [fcm_make.1] -Failed to construct job submission command 
    43 2015-01-21T14:56:23Z WARNING - Command '['ssh', '-oBatchMode=yes', '-oConnectTimeout=10', 'exvmsrose 
    44 .monsoon-metoffice.co.uk', 'mkdir -p "$HOME/cylc-run/nemovar_build" "$HOME/cylc-run/nemovar_build/lo 
    45 g/job"']' returned non-zero exit status 255 
    46 2015-01-21T14:56:23Z ERROR - [fcm_make.1] -submission failed  
    47 }}} 
    48  
    49 This is because of an inability to ssh into the Rose VM from the Cylc VM interactively.  
    50  
    51 To solve, log in to the Cylc VM and then back to the Rose VM specifying the full paths, to add these to the known_hosts file.  
    52  
    53 1. Check whether exvmscylc or exvmsrose appear in the known_hosts file already. If so delete these entries, especially if you accessed the VMs before their rebuild:  
    54 {{{ 
    55 cd .ssh 
    56 mv known_hosts known_hosts.OLD 
    57 sed '/^exvmsrose/d;/exvmscylc/d' known_hosts.OLD > known_hosts 
    58 }}} 
    59  
    60 2. Now from exvmsrose, ssh into exvmscylc using the full path:  
    61 {{{ 
    62 ssh exvmscylc.monsoon-metoffice.co.uk 
    63 }}} 
    64   This should provide output something like this:  
    65 {{{ 
    66 The authenticity of host 'exvmscylc.monsoon-metoffice.co.uk (10.168.64.4)' can't be established. 
    67 RSA key fingerprint is 98:c8:5e:b9:b3:d2:2f:c4:9c:89:78:08:d6:78:70:3a. 
    68 Are you sure you want to continue connecting (yes/no)?  
    69 }}} 
    70   Type {{{yes}}}.   
    71  
    72 3. Now from exvmscylc, log in to exvmsrose using the full path:  
    73 {{{ 
    74 ssh exvmsrose.monsoon-metoffice.co.uk 
    75 }}} 
    76   And again type {{{yes}}} at the prompt.  
    77  
    78 4. Type {{{exit}}} to get back to the Rose VM, then ssh into exvmsrose again, and this should succeed without any interative prompts.  
    79  
    80 5. Now type {{{exit}}} twice to get back to the original Rose terminal. And try re-submitting the rose suite.  
    81  
    82 === No gcylc window ===  
    83  
    84 When submitting a job, no gcylc window appears.  
    85  
    86 Sometimes the gui is slow to load. If it does not appear at all however, check that you have X11 forwarding set up from your **initial location and the lander**.  
    87  
    88 To do so ssh with the -Y option or alternatively, append the following line to your ~/.ssh/config file:  
    89 {{{ 
    90 Host * 
    91 ForwardX11 yes 
    92 }}} 
    93  
    94 === Rose suite running but can't shutdown === 
    95  
    96 A rose suite is supposedly running, i.e. {{{rose suite-scan}}} gives something like:  
    97 {{{ 
    98 puma-aa046 gmslis@exvmscylc:7767  
    99 }}} 
    100 Or trying to re-run the suite gives an error {{{rose suite-run}}}  
    101 {{{ 
    102 [FAIL] Suite "puma-aa046" may still be running. 
    103 [FAIL] Host "exvmscylc" has process: 
    104 [FAIL]     9468 python /home/fcm/cylc-6.1.2/bin/cylc-run puma-aa046 
    105 [FAIL]     9469 python /home/fcm/cylc-6.1.2/bin/cylc-run puma-aa046 
    106 [FAIL] Try "rose suite-shutdown --name=puma-aa046" first?  
    107 }}} 
    108  
    109 However, when trying to shutdown the suite, {{{rose suite-stop}}} reports that the suite isn't running:  
    110 {{{ 
    111 Really shutdown puma-aa046 at exvmscylc? [y/n] y 
    112 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 
    113 'ERROR, remote port file not found'  
    114 }}} 
    115  
    116 This is due to orphaned tasks on the Cylc VM, which can occur when exvmscylc and exvmsrose cannot communicate non-interactively.  
    117  
    118 To solve, log in to exvmscylc, and run {{{cylc scan}}}, this should show running tasks. To stop these, type:  
    119 {{{ 
    120 cylc shutdown --now 
    121 }}} 
    122 This may report something like "Command queued", but re-running {{{cylc scan}}} will show that the tasks are now finished.