Upgrading to Cylc 8
We are still in the process of testing Cylc 8 on PUMA2 and ARCHER2. These instructions are under development.
These instructions assume you have followed the standard PUMA2 setup instructions, and that the workflow you are upgrading runs correctly under Cylc 7 on ARCHER2.
Please also refer to the Cylc 8 migration guide and the instructions for running Cylc 8 on PUMA2.
Overview
Cylc 8 has a compatibility mode that allows you to run an existing Cylc 7 workflow without fully upgrading.
Important: On PUMA2-ARCHER2 there are still several changes required to run even in compatability mode.
These are mostly just changes to the top-level runtime settings. You may not have to make all these changes to your workflow, but do check all of the points below. As ever, we can’t cover all eventualities so your workflow may required additional changes. Get in touch with the CMS helpdesk if you need further advice.
After making these changes, you should then fully upgrade to Cylc 8. This is mainly syntax change to the suite definition files, including replacing remote “hosts” with “platforms”.
1. Check your PUMA2/ARCHER2 setup
a. Check your ssh setup
It is important that you have the ssh-setup
script sourced in your .bash_profile
and not .bashrc
or any other file.
Information: Cylc 8 job scripts are now launched in a new subshell and this only loads .bash_profile
by default. We need to make sure the ssh-setup
script is run so that any fcm_make
tasks can mirror code to ARCHER2.
In your ~/.bash_profile
you should have the following:
# Ensure persistent ssh-agent
. $HOME/.ssh/ssh-setup
b. Check any user-specific settings
Make sure your .bash_profile
on ARCHER2 has the following line:
. /work/y07/shared/umshared/bin/rose-um-env-puma2
Check that this is the puma2 version and not rose-um-env-puma
. Also check any other files such as .profile
or .bashrc
.
If you have any user configuration files for Rose or FCM on PUMA2 or ARCHER2, these may cause incompatibilies at Cylc 8. Rose and FCM configurations are under ~/.metomi
.
2. Check your workflow validates at Cylc 7
Cylc 7 supports some depracted syntax so you will need to upgrade this before moving to Cylc 8.
Navigate to the roses
suite directory and run the following:
export CYLC_VERSION=7
rose suite-run --validate
If any warning messages appear, follow the instructions until your suite is fully Cylc 7 compliant.
3. Make Cylc 8 compatibility changes to your suite
a. Add a remote_setup
task to the graph
We need a dummy task that sets up the cylc-run
directory on ARCHER2 before any fcm_make
mirrors start.
Information: On ARCHER2 the cylc-run directory for each workflow is symlinked from /home
to /work
. In Cylc 8, the symlink is not set up when the workflow starts, and so the fcm_make
mirrror copies data to the wrong location.
Edit the suite.rc
file, and add in a new task remote_setup
that runs before any fcm_make
tasks, e.g.:
{%- if BUILD == true %}
remote_setup => fcm_make => fcm_make2 => \
{%- endif %}
Add the task definition in the appropriate place.
- i. If your workflow does not have a
site/archer2.rc
file, add this tosuite.rc
:
[[remote_setup]]
inherit = HPC
script = "echo 'Ensure suite dir set up correctly on remote host.'"
[[[job]]]
execution time limit = PT1M
batch system = background
- ii. If your workflow does have a
site/archer2.rc
file, add this to thesuite.rc
:
[[remote_setup]]
inherit = None, REMOTE_SETUP_RESOURCE
script = "echo 'Ensure suite dir set up correctly on remote host.'"
And this to the site/archer2.rc
:
[[REMOTE_SETUP_RESOURCE]]
inherit = HPC
[[[job]]]
execution time limit = PT1M
batch system = background
b. Remove any instances of [runtime][task][remote]owner
Cylc 8 no longer supports remote usernames in the workflow definition.
Information: See here for the details of this change. Remote user names should instead be set in your .ssh/config
file. If you followed the PUMA2 setup instructions, this should already be setup correctly for ARCHER2 and JASMIN.
Check your suite.rc
and/or site/archer2.rc
file and remove any lines like this:
owner = {{ARCHER2_USERNAME}}
c. Update the ARCHER2 slurm flags
The --export=none
flag should be removed from the ARCHER2 slurm headers.
Information: This setting stops the required run environment from being loaded properly at Cylc 8. If it is included you will see an error like:
/work/n02/n02/annette_test/cylc-run/u-cc519-comp/run1/share/fcm_make/build-recon/bin/um-recon.exe: error while loading shared libraries: libfabric.so.1: cannot open shared object file: No such file or directory
Edit your suite.rc
and/or site/archer2.rc
and remove the --export=none
line. It will probably be under [[HPC]] [[[directives]]]
.
d. Make sure FCM extracts from the mirror repositories
In each of your fcm_make_*
apps, check that any references to e.g. fcm:moci.x
are changed to fcm:moci.xm
.
Information: Since Cylc 8 job scripts run under a new subshell, gpg agent will not be available to the fcm make extract tasks, therefore we need to use the MOSRS mirror repositiories on PUMA2.
e. Set path to Rose/cylc libraries if needed
If you have a script that uses the rose or Cylc python libraries, you will need to set the path directly (since the job environment is no longer inerited). For example the xml
task for UM-XIOS uses rose macros, so we need:
[[XML_RESOURCE]]
inherit = HPC_SERIAL
pre-script = """
export PATH=/work/y07/shared/umshared/metomi/rose-2019.01/bin:$PATH
export PYTHONPATH=/work/y07/shared/umshared/metomi/rose-2019.01/lib/python:$PYTHONPATH
"""
Note: Here we are still using the old version of Rose to run the scripts. To fully upgrade, the scripts would be updated to Python 3 and the new Rose 2 and Cylc 8 packages.
f. Change rose date to isodatetime
Instances of rose date
may need to be replaced with the isodatetime
command. For example,
{% set ROSEDATE = "rose date -c --calendar=" + CALENDAR %}
should become:
{% set ROSEDATE = "isodatetime --calendar=" + CALENDAR %}
Note The rose date
command will work on puma2, but it won’t work in a task script run on Archer2. So you may not need to change all instances.
4. Check Cylc 7 compatibility mode
a. Check your workflow validates at Cylc 8
Run the following, from the rose workflow directory:
export CYLC_VERSION=8
cylc validate .
If everything is OK you should get the following response:
WARNING - Backward compatibility mode ON
Valid for cylc-8.3.2
b. (Optionally) run in cylc 8 with compatibility mode
You should be able to run your suite under Cylc 8 at this point.
cylc vip
Note that this is still using old Cylc 7 syntax and you will need to fully upgrade to Cylc 8.
5. Upgrade to a Cylc 8 workflow
a. Add in the workflow definition
First rename the suite.rc
file toflow.cylc
.
Then at the top of the rose suite.conf
file, replace the line:
[jinja2:suite.rc]
with
[template variables]
b. Switch to platforms
Each task or family that defines a host
and/or batch system
should be replaced by a platform. These might be set in the suite.rc
file or the site/archer2.rc
file (or both). For example:
- i. localhost
[[NCAS_NOT_SUPPORTED]]
[[[job]]]
batch system = background
becomes
[[NCAS_NOT_SUPPORTED]]
platform = localhost
- ii. ARCHER2 slurm
[[HPC]]
[[[remote]]]
host = $(rose host-select archer2)
[[[job]]]
batch system = slurm
becomes
[[HPC]]
platform = archer2
- iii Jasmin sci node background
[[JASMIN]]
[[[remote]]]
host = sci2.jasmin.ac.uk
[[[job]]]
batch system = background
becomes
[[JASMIN]]
platform = sci-bg
The platforms should be selected from the list of supported platform for PUMA2.
c. Update to Cylc 8 syntax
Then run:
cylc validate .
This produces a list of warnings which describe the remaining syntax changes required to upgrade to Cylc 8.
For more details refer to the https://cylc.github.io/cylc-doc/stable/html/7-to-8/summary.html#upgrading-to-cylc-8
d. Run with Cylc 8
Once you can run cylc validate .
with no warnings, you are ready to try running your suite at Cylc 8 with:
cylc vip
6.Other potential issues
a. Cylc variables
Cylc variables starting with CYLC_SUITE_
are now depreacted and should technically be replaced with CYLC_WORKFLOW_
versions. As of Cylc version 8.3.4 the old variables still seem to work but they may not be equivalent (see below).
b. Cylc variable for pptransfer
If you are using pptransfer with archive_name=$CYLC_SUITE_NAME
at Cylc 8 this will change from e.g. u-de385
to u-de385/run1
. It will still work but it may not be how you want to store your data on Jasmin GWS or ET. Consider changing this to archive_name=$CYLC_WORKFLOW_NAME
which will give u-de385
.