Opened 3 years ago

Closed 22 months ago

#2140 closed help (answered)

Request collaboration to install latest UM (vn10.3-vn10.8, fcm, rose and cylc) at UoLeeds

Reported by: markr Owned by: um_support
Component: UM Model Keywords: install fcm, rose, cylc
Cc: C.Dearden@… Platform: Other
UM Version: 10.8


Hello CMS,
I would like to know if you can advise on a sensible way to install the latest form of the UM on ARC3. Initially we had considered working with Polaris but I have been advised that the future is uncertain on that system. At least with ARC3 it is a Leeds specific HPC and expected to be available over next 5 years.

I spoke with a senior member of ARC at Leeds and they would like us to provide them with a list of foundation software and overview of the steps required to get the service running.

Currently the demand is only from 2 researchers but when it is available I can see it becoming more popular.

Many thanks,

Attachments (1)

GCOM_uoleeds_arc5_test.png (89.4 KB) - added by markr 3 years ago.
GCylc image of progress of GCOM rose stem —group=arc3_intel_test

Download all attachments as: .zip

Change History (63)

comment:1 Changed 3 years ago by grenville


I think we should have a meeting to discuss this. I'm sure it's doable with a small foundation software requirement, but doing this by email won't be efficient.

Rose stem makes later versions simpler to install - that certainly was the case for UM 10.7.

How are you set for some time in the week of April 24th?


comment:2 Changed 3 years ago by markr

Hi Grenville,
okay, that week has 3 days possible: 24, 26, 28. Or the afternoon of 25th for a short meeting.


comment:3 Changed 3 years ago by markr

Hello CMS,
after our telecon I had a look at where the queueing system is defined and see that in the rose jobs that I use the "suite.rc" sets a value for the [job submission] method = pbs and there seems to be a python script In the cylc home directory: /home/fcm/cylc-6.11.4/lib/cylc/batch_sys_handlers/

So I presume it is a matter of converting the suite.rc PBS contexts to SGE context.
I will provide a sample SGE for the ARC3 system.

comment:4 Changed 3 years ago by ros

Hi Mark,

I was going to send my example polaris suite which has this all in, but got diverted.

You need to set; for example:

  batch system = sge
  -l h_rt = 00:01:00

There was a small bug in the cylc code for SGE at cylc-6.x which I have fixed on PUMA.


comment:5 Changed 3 years ago by markr

Hello CMS,
a little more on the accuunt setup at leeds: from Martin Callaghan:

Hi Mark,

The shared accounts are actually project accounts which will be owned by you, 
and the 'ear' identifier is school specific. Within reason, you can have 
anything you like after the 'ear' bit.

If you use an existing project account, we can get this set up on ARC3 very quickly. 
If you want a new one, then it's a (paper) form to fill in and get it countersigned 
by Richard Rigby. I have a small supply of these paper forms.


So I think you will have to continue using *earhum*

If you have a Polaris UM suite then it would be nice to compare to an Archer/Monsoon equivalent as, for my work, I will be converting suites from UKCA team (Mohit Dalvi).

With Juliane's project i would likely have to convert meto internal for ARC3 use. (i.e. PBS to SGE and data paths for ARC3).


Last edited 3 years ago by markr (previous) (diff)

comment:6 Changed 3 years ago by ros

  • Status changed from new to pending

comment:7 Changed 3 years ago by markr

Progress to date:

  1. ARC have enabled the ssh access to from (using IP address)
  1. I have transferred the fcm, rose, cylc from archer umshared software but found some broken links: e.g.

lrwxrwxrwx 1 earmgr EAR 38 Dec 12 13:55 keyword.cfg → ../../../fcm_admin/etc/fcm/keyword.cfg

Do I need a fcm_admin folder?

  1. some folders on archer are very large and I do not want indiscriminately to copy 7TB of files.
  1. the .cylc/global.rc on arc3 appears to "work". I did it first on arc3 then realise I should do it on puma.

cylc get-site-config

stops at [batch systems?]

I am still not sure where to set the batch submission method to "sge".

The work continues…

comment:8 Changed 3 years ago by markr

Have now tried to runthe the "jasmin test suite " see ~markr/roses/arc3_leeds_check

it fails like this:
markr@puma arc3_leeds_check $ rose suite-run
[INFO] create: /home/markr/cylc-run/arc3_leeds_check
[INFO] create: log.20170516T112551Z
[INFO] symlink: log.20170516T112551Z ⇐ log
[INFO] create: log/suite
[INFO] create: log/rose-conf
[INFO] symlink: rose-conf/20170516T122551-run.conf ⇐ log/rose-suite-run.conf
[INFO] symlink: rose-conf/20170516T122551-run.version ⇐ log/rose-suite-run.version
[INFO] create: share
[INFO] create: share/cycle
[INFO] create: work
[INFO] export CYLC_VERSION=6.11.4
[INFO] export ROSE_ORIG_HOST=puma
[INFO] export ROSE_VERSION=2016.11.1
[INFO] install: suite.rc~
[INFO] source: /home/markr/roses/arc3_leeds_check/suite.rc~
[INFO] install: suite.rc
[INFO] 0 suite(s) unregistered.
[INFO] REGISTER arc3_leeds_check: /home/markr/cylc-run/arc3_leeds_check
[INFO] symlink: /home/markr/cylc-run/arc3_leeds_check ⇐ /home/markr/.cylc/arc3_leeds_check
[FAIL] ssh -oBatchMode=yes earmgr@… bash —login -c \'ROSE_VERSION=2016.11.1\ rose\ suite-run\ -v\ -v\ —name=arc3_leeds_check\ —run=run\ —remote=uuid=e93dff9b-53df-485f-9eab-d5ac4ccf1d4a\' # return-code=255, stderr=
[FAIL] Host key verification failed.

comment:9 Changed 3 years ago by ros

Hi Mark,

  1. You will only really need the fcm keyword.cfg file if you are allowing code checkouts directly on the ARC3 system. However, I would recommend creating the fcm_admin/etc/fcm folder as per on ARCHER but with a blank keyword.cfg file and then as and when needed you can populate with any required repository keywords.
  1. The batch submission system method is set in an individual suite's or rose-stem suite's suite.rc file under
  batch system = sge

Hope that helps

comment:10 Changed 3 years ago by markr

NOTE ssh -Y earmgr@… works passwordless

Then I fixed the suite.rc to be markr for owner and it now fails as:
[INFO] install: suite.rc
[FAIL] ssh -oBatchMode=yes bash —login -c \'ROSE_VERSION=2016.11.1\ rose\ suite-run\ -v\ -v\ —name=arc3_leeds_check\ —run=run\ —remote=uuid=b268bc47-3c26-48b7-9b95-5a32e43fea6b\' # return-code=255, stderr=
[FAIL] Warning: Permanently added the ECDSA host key for IP address '' to the list of known hosts.
[FAIL] Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password,hostbased).
markr@puma arc3_leeds_check

comment:11 Changed 3 years ago by markr

The job worked as background (hoorah!) after I change owner back to earmgr (the arc3 identity).
Then I activated SGE as the batch method.

the submission failed. So I looked at submitting the job directly as no error or stdout logfiles had been created:

[earmgr@login2.arc3 01]$ ls
job  job.status
[earmgr@login2.arc3 01]$ qsub job
Unable to run job: must specify a value for h_rt (job runtime)

So I need to revise the suite.rc or appropriate rose component to set up the minmum settings for SGE to work with this basic case.

comment:12 Changed 3 years ago by markr

Hi Annette, (I just realised I should copy this message to the ticket)

  1. On puma I have used your jasmin test suite. See ~markr/roses/arc3_leeds_check.

I have a rose.conf in ~/.metomi with the recommended settings.

markr@puma arc3_leeds_check $ cat /home/markr/.metomi/rose.conf


the suite runs okay but the files seem to be created on my arc3 account $HOME

  1. On ARC3 I set the following in my .bashrc

UMDIR=${HOME}/umshared (this will change to ~/earumsh; when I get my installation working )



MCYLC=${DATADIR}/cylc-run (so I can get there quickly)

  1. The result of the run is under $HOME/cylc-run/arc3_leeds_check and there is an extra directory in "work" that is


  1. To see my account you would have to apply for ARC3 account from here:

I did send ARC help a message several weeks ago that I would like help from your team and mentioned the polaris accounts.

Perhaps you can name me as principal investigator as Technical Head of CEMAC. Otherwise it is to support Juliane Schwendike and her WCSSP work using the Unified model.

Also you would be a "NEW USER: ARC2 and ARC3 account" (their home dirs are common).


On 23/05/17 16:58, Annette Osprey wrote:

Hi Mark,

For now those lines should go in your $HOME/.metomi/rose.conf file on PUMA, but once you are happy with the configuration we can put it in the central configuration file on PUMA.

No need to worry about ross, as I don't think you need any configuration on the HPC side. In fact ross is an old way of managing the installations and the Met O have newer scripts but we haven't had a chance to update yet.

If this definitely isn't working, we can take a look, but I am not sure if I can log into your system. I did have a Polaris account some time ago…?


On 23/05/17 15:50, Mark Richardson wrote:

Hi Ros, annette.

Not yet had any joy with the trying to put "work" onto the unlimited volatile disks of arc3.

Ros, I tried creating a the rose.conf but get confused about where this file should be. These did not have an effect.


Is it in ${HOME}/metomi, ${CYLC_HOME}/etc or the rose suite rose-suite.conf or another place entirely?

I read a lot of the CMS online info and wondered if I need to work with the "ross" directories as well?

The ideal configuration will work with path set to read ~earumsh/software (equivalent to umshared on archer, I think) and then work with the case in


but keep logs and small files in


I tried to follow the working directory in the cylc python source and thought it was CYLC_SUITE_WORK_DIR or something from job_conf[].



Dr. Mark Richardson
Technical Head of CEMAC (
Room 10.115 School of Earth and Environment
University of Leeds

comment:13 Changed 3 years ago by annette

Hi Mark,

Just to double check are you definitely submitting a clean run, i.e. deleting the ${HOME}/cylc-run/arc3_leeds_check directory on ARC, or running with rose suite-run --new? If the cylc-run directories are already in the wrong place it won't recreate them.

Just re-reading your email thread with Ros… which option did you decide to go with?

  1. Put the whole cylc-run/<suite-id>/ directory on the fast disk (/nobackup) or
  2. Just put the cylc-run/<suite-id>/share/ and cylc-run/<suite-id>/work/ directories on the fast disk.

Option 1. is what we do on Archer and option 2. is what we do on Monsoon so I should have thought we could get it to work.

In the suite.rc file you have the line:

work sub-directory = $DATADIR

which I guess is what is creating that directory cylc-run/<suite-id>/work/nobackup/, but I am wondering if this is what you meant to do? The work sub-directory directive is just a way of creating a shared work space for multiple tasks to use (otherwise each task runs from its own <taskname> directory.


comment:14 Changed 3 years ago by markr

Hi Annette, Ros,
progress so far: I can now run and get "work and "share" in the /nobackup LUSTRE file system.

I am sorry if the evidence on puma is confusing. The suite is a bit volatile while I tried some other things.

I wanted a variation of option 2 i.e. the case should run on the lustre file system

/nobackup/earmgr/cylc-run/caseID/work/ etc..

The build should be on the non-lustre disk (speed of compilation as optimisation involves creating and deleting of small files - bad thing for LUSTRE).

${HOME}/cylc-run/caseID/share/fcm_make etc

However, it looks like the prefixes for those directories have the common HOME hard-wired in


SO I will go with option 2 for now.

Odd that work and shared still appear in the cylc-run home.

Next I have to modify a UM suite to run on the arc3 system.

Thank you for the guidance so far…

comment:15 Changed 3 years ago by annette

Hi Mark,

Yes share/ and work/ will still be under the cylc-run directory. However there is a way to get the build to use /home (or whatever) for compilation, even though the share/ and work/ directories are on /nobackup. We do this on Monsoon, and have tested on Archer but it didn't really help performance for us. I can't remember the details off-hand but I think Ros knows (she is out today), or I will dig out the info for you.


comment:16 Changed 3 years ago by markr

Hi all,
I followed a lot of the guidance on GCOM and now I find the rose stem for that fails:

 markr@puma 01 $ more job.err
[FAIL] config-file=/home/markr/cylc-run/vn6.2_arc3_leeds/work/1/fcm_make_arc3_mpp/fcm-make.cfg:2
[FAIL] config-file= - puma:/home/markr/DevWork/GCOM/vn6.2_arc3_leeds/fcm-make/gcom.cfg
[FAIL] puma:/home/markr/DevWork/GCOM/vn6.2_arc3_leeds/fcm-make/gcom.cfg: cannot load config file
[FAIL] puma:/home/markr/DevWork/GCOM/vn6.2_arc3_leeds/fcm-make/gcom.cfg: cannot be read
[FAIL] Host key verification failed.

You can see the evidence in ~/cylc_run/vn6.2_arc3_leeds

I realise this is the 2-stage extract-mirror-build but I have not really changed much of the site/uoleeds/suite.rc other then rename "archer" to "arc3" and in places uoleeds_arc3.
The latter because I hijacked the uoe_emps_intel_mpp machine file.


comment:17 Changed 3 years ago by markr

can you help me understand why my basic rose job arc3_leeds is failing to submit with this message in SGE

error reason          1:      05/26/2017 14:43:16 [256785:104944]: error: can't open output file "/home/ufaserv1_c/earmgr/cylc-run/arc3_leeds_check/log.20170526T130559Z/job/1/initialise/02/cylc-run/arc3_leeds_check/log/job/1/initialise/02/job.out": No such file or directory

it looks like the log path is being concatenated twice.

comment:18 Changed 3 years ago by annette


You confirmed above that you wanted just the share and work sub-directories on your fast disk ($DATADIR). And this is what you have specified in your rose.conf file.

Given this you do not need the following line in your suite.rc file:

initial scripting = "export HOME=$DATADIR"

This line is only needed when you are putting the whole cylc-run directory on $DATADIR, which is not what you are doing here.

Remove the line and try again.


comment:19 Changed 3 years ago by markr

Hi Annette,
I no longer have the DATADIR override. Only for work and share.
This error (quoted below) is about the make of GCOM from a branch I made: vn6.2_arc3_leeds.
Must I manually delete things to trigger a fresh build attempt?

Previous error has not yet been solved (not finding gcom.cfg) - but need to try it again.

markr@puma rose-stem $ rose stem --group=arc3_intel_build
[INFO] Source tree /home/markr/DevWork/GCOM/vn6.2_arc3_leeds added as branch
[INFO] Will run suite from /home/markr/DevWork/GCOM/vn6.2_arc3_leeds/rose-stem
[FAIL] Suite "vn6.2_arc3_leeds" has running processes on:
[FAIL] Try "rose suite-shutdown --name=vn6.2_arc3_leeds" first?
markr@puma rose-stem $ rose suite-shutdown --name=vn6.2_arc3_leeds
Really shutdown vn6.2_arc3_leeds at [y or n (default)] y
security reasons
[FAIL] cylc shutdown vn6.2_arc3_leeds --force # return-code=1
markr@puma rose-stem $

comment:20 Changed 3 years ago by annette

Hi Mark,

So does your basic test suite work OK now?

To force shutdown of a suite you sometimes have to kill rogue processes - try following the instructions here:

In reference to your gcom.cfg error in comment:16, can you try logging into puma from the command-line with a simple:

ssh puma

i) check this works, and ii) it may prompt you to add puma to your known hosts, which it can't do non-interactively.


comment:21 Changed 3 years ago by markr

The u-am554 case gets some way (all?) through fcm_make2 but the fails.
No clear reason as the build log seems to complete. However the build dir is empty on ARC.
It looks like the preprocessed source is in place.

comment:22 Changed 3 years ago by markr

I cannot find any fail log or message other then the colour of the Suite in Gcylc.
On arc3:


Similarly on puma:


comment:23 Changed 3 years ago by grenville


I note that u-am554 uses ncas-xc30-cce for its config (platform_config_dir) - this is for ARCHER specifically. This may not be the cause of the lack of output (prob not), but you won't manage a build with this config. You'll need to add one specific to ARC3.


comment:24 Changed 3 years ago by markr

I am now reviewing the notes from Annette and see I have skipped steps 3 and 4 i.e. build GCOM and then configure the UM for uoleeds site with a UM branch.
I have a GCOM branch and was part way through that.
I am now back to building GCOM.

Must walk before I can run. Step-by-step.


comment:25 Changed 3 years ago by markr

Now I am back at this failure that distracted me to investigate if rose-cylc task on remote was working.
However, I believe at this stage the tasks are running on the suite-host (local t puma).

[FAIL] config-file=/home/markr/cylc-run/vn6.2_arc3_leeds/work/1/fcm_make_arc3_intel_serial/fcm-make.cfg:2
[FAIL] config-file= - puma:/home/markr/DevWork/GCOM/vn6.2_arc3_leeds/fcm-make/gcom.cfg
[FAIL] puma:/home/markr/DevWork/GCOM/vn6.2_arc3_leeds/fcm-make/gcom.cfg: cannot load config file
[FAIL] puma:/home/markr/DevWork/GCOM/vn6.2_arc3_leeds/fcm-make/gcom.cfg: cannot be read
[FAIL] Host key verification failed.

[FAIL] fcm make -f /home/markr/cylc-run/vn6.2_arc3_leeds/work/1/fcm_make_arc3_intel_serial/fcm-make.cfg -C /home/markr/cylc-run/vn6.2_arc3_leeds/share/uoleeds_arc3_ifort_serial -j 4 mirror.prop{}=2 # return-code=255
Received signal ERR
cylc (scheduler - 2017-06-06T10:35:21+01): CRITICAL Task job script received signal ERR at 2017-06-06T10:35:21+01
cylc (scheduler - 2017-06-06T10:35:21+01): CRITICAL failed at 2017-06-06T10:35:21+01

What task needs to see that cfg file and where is it running?

comment:26 follow-up: Changed 3 years ago by annette


Did you ssh into puma from puma as I suggested in comment:20? Please confirm whether this gets you any further.


Changed 3 years ago by markr

GCylc image of progress of GCOM rose stem —group=arc3_intel_test

comment:27 Changed 3 years ago by markr

Some progress, but now GCOM tests fail to find gcom.exe
Also there are no "build" dirs in in the expected directories.

comment:28 in reply to: ↑ 26 Changed 3 years ago by markr

Replying to annette:


Did you ssh into puma from puma as I suggested in comment:20? Please confirm whether this gets you any further.


Okay done that now. had to answer yes.
Also moving onto GCOM build.
Not confident of site/uoleeds_arc3/suite.rc changes

comment:29 Changed 3 years ago by annette


From your logs, I don't think the builds have actually done anything, they have all completed suspiciously quickly. And you say there are no build directories in the expected places.

I will look at your changes and see if I can spot anything.


comment:30 Changed 3 years ago by annette

Hi Mark,

Looking at some of the files in your cylc-run, I don't think the mirror is working correctly.

Has the code been copied over to arc3? i.e. on arc3, do you have a directory like:


And if you look in there can you see the gcom code?

My hypothesis is that you don't have this…

And I think that you need to add these lines to your fcm-make cfg files (uoleeds_arc3_ifort_openmpi.cfg etc): = ${ROSE_TASK_MIRROR_TARGET}
mirror.prop{config-file.steps} = $REMOTE_ACTION

Sites might not have these if they are not doing a mirror (because they are submitting suites from the same system so don't need to copy the code).

Also I would just test out one thing at a time, otherwise it can be hard to see what is going on. So maybe just the intel build:

rose stem --group=arc3_intel_build

Then check that it has actually copied the code over (look in the preprocess directory above) and built something (look in build/lib), before running the tests.


comment:31 Changed 3 years ago by markr

Hi Annette,
I appreciate the difficulty of helping "blind". No preprocess directory. BTW I copied uoe_emps_ifort_openmp.cfg for ARC3. I will have to look a bit closer there too.

This is on arc3:

[earmgr@login1.arc3 share]$ ls -ltr /nobackup/earmgr/cylc-run/vn6.2_arc3_leeds/share/uoleeds_arc3_ifort_openmpi/
total 12
drwxr-xr-x 3 earmgr EAR 4096 Jun  6 11:07 extract
-rw-r--r-- 1 earmgr EAR   22 Jun  6 11:39 fcm-make2.cfg
-rw-r--r-- 1 earmgr EAR 1014 Jun  6 11:39 fcm-make2.cfg.orig
lrwxrwxrwx 1 earmgr EAR   32 Jun  6 11:39 fcm-make2-on-success.cfg -> .fcm-make2/config-on-success.cfg
lrwxrwxrwx 1 earmgr EAR   14 Jun  6 11:39 fcm-make2.log -> .fcm-make2/log
lrwxrwxrwx 1 earmgr EAR   31 Jun  6 11:39 fcm-make2-as-parsed.cfg -> .fcm-make2/config-as-parsed.cfg

comment:32 Changed 3 years ago by annette

Hi Mark,

Looking in rose-stem for uoe, emps is a 1-step build. Please try adding in those mirror lines and retry.


comment:33 Changed 3 years ago by markr

more progress now failing to buid with mpicc inking error.
See openmpi build log

Got to go to a meeting now.
More later.

Can I exercise the build within the directories on arc3 and see if I can get the right environment to diagnose the failure?

some sort of comnad like "cylc task run build" ?

comment:34 Changed 3 years ago by annette

Hi Mark,

You should be able to bypass rose/cylc entirely… Go into the uoleeds_arc3_ifort_openmpi directory, then run:

fcm make -f fcm-make2.cfg

Set any build options in fcm-make2.cfg, and you can see the build commands in fcm-make2.log.

If you don't want to bypass rose/cylc, then go into the log directory and drill down until you get to the job run script for that task. You can submit this manually to the queue or run on the command-line.


comment:35 Changed 3 years ago by markr

The make -f fcm-make2.cfg id not work: because I just realised I forgot to use "fcm".

Meanwhile I await this:
I realise the link line has -lmpl (I think and MVAPICH library) so I removed it from the machines files in :

Now running the rose stem again fro the GCOM branch.
Will the the fcm make command after this rose stem exits.

comment:36 Changed 3 years ago by markr

Still getting this for the C code gc_abort.c :

[FAIL] /apps/developers/libraries/openmpi/2.0.2/3/intel-17.0.1/bin/mpicc -E -I./include /nobackup/earmgr/cylc-run/vn6.2_arc3_leeds/share/uoleeds_arc3_ifort_openmpi/extract/gcom/gc/gc__abort.c # rc=127
[FAIL] /apps/developers/libraries/openmpi/2.0.2/3/intel-17.0.1/bin/mpicc: error while loading shared libraries: cannot open shared object file: No such file or directory
[FAIL] process    0.0 ! gcom/gc/gc__abort.c  <- gcom/gc/gc__abort.c

comment:37 Changed 3 years ago by markr

The queues on ARC3 just got busier as they have turned off ARC1.
So the gcylc shows a failed submission where actually the job is still waiting in the queue.
Perhaps something about SGE that I have yet to configure.

Meanwhile the build failed to find a library that probably is related to the LD_LIBRARY_PATH.


comment:38 Changed 3 years ago by markr

Hi Ros,
I see that you have now got access to arc3 through the remote access gateway.
I find

is a useful site for the ARC systems. It has 24c per node and some are large memory nodes.
Also there are some K80 nvidia nodes which might be fun if we had a GPU version of the UM.

We created a user account for shared access (earumsh) and I have put what I was working with in that directory.
Let me know if it looks okay.
If you supply an ssh key then you could do work there too.

Let me know how you want to proceed. Whether to do it all independently or coordinate with me.
NOTE by default $HOME is closed even to groups. I have opened up group read/access to both earumsh and earmgr.


comment:39 Changed 3 years ago by markr

Now I am about to start wrking with a branch to configure the UM - I will use vn10.8.
I notice that now it uses gcom6.3. I think I will have to go back and repeat the GCOM work in a vn6.3 branch.

gcom branch: vn6.2_arc3_leeds

um branch: vn10.8_uoleeds_arc3_intel_cfg

comment:40 Changed 3 years ago by markr

Hi I recently got back to this and as I am installing vn10.8 I discovered that I need to install shumlib. I have done htat and now the u-ap304 (basic um job) gets a little further but now fails to resolve some netcdf references.

I have used the following settings and they report clearly:


###init-script = "export HOME=$DATADIR"
# expect to use intel environment but need to confirm versions of compilers
pre-script = """

module swap openmpi intelmpi
module load hdf5
module load netcdf
module list 2>&1
icc -V 2>&1
ifort -V 2>&1

However the link of um-atmos.exe fails to find netcdf_mp_xx etc references.
Also I noticed that it is linking with a -lum-atmos in a temporary node specific directory.
Do I need to set TMPDIR too?

comment:41 Changed 3 years ago by ros

Further email from Mark:

sorry I have only just got back to this. Can you advise on how I can solve this:

The build complains of missing netcdf functions (ignore the end apostrophe due to my grepping):


Currently Loaded Modulefiles:

1) licenses 4) intelmpi/2017.1.132 7) netcdf/4.4.1
2) sge 5) user
3) intel/17.0.1 6) hdf5/1.8.17

comment:42 Changed 3 years ago by grenville


There's not much in the way of trace on PUMA to help see what's going on. Can you supply more logging info. I can't see the context in which the above messages appear.


comment:43 Changed 3 years ago by markr

Hello Grenville,
I see the logs of u-ap304 on puma are slightly out of date compared to those on ARC3.
I transferred the log (scp -p earmgr@…:TARfiles/u-ap304-log.20170920T160017Z.tar.bz2 .)
which you could extract with tar jxvf /home/markr/TAR_files/u-ap304-log.20170920T160017Z.tar.bz2

I had a look in the netcdf "lib" directory on archer and arc3 to compare and both have an entry for netcdf_mp_nf90_strerror_ (for example ) :

ON ARC3 in /apps/developers/libraries/netcdf/4.4.1/1/intel-17.0.1-intelmpi-2017.1.132/lib
nm| grep -i _strerror

U nc_strerror

000000000004e68c T netcdf_mp_nf90_strerror_
0000000000015145 T nf_strerror_
00000000002f63f0 b nf_strerror_$CSTRPTR
00000000002f63f8 b nf_strerror_$FSTRPTR

On ARCHER in /opt/cray/netcdf/

mricha@eslogin008:/opt/cray/netcdf/> nm| grep strerror

U nc_strerror

00000000000acbd0 T netcdf_mp_nf90_strerror_
0000000000012b60 T nf_strerror_
00000000002c2218 b nf_strerror_$CSTRPTR.0.2
00000000002c2210 b nf_strerror_$FSTRPTR.0.2

I am not sure why my build on ARC3 is not finding the netcdf (F) library… they all have trailing underscore too.

Many thanks for your help so far,


comment:44 Changed 3 years ago by annette

Hi Mark,

I have had a look in the generated fcm-make config file on PUMA:


And the path to the netcdf library is not being set. You should have something like: -I/path-to-netcdf/include in fc.flags and -L/path-to-netcdf/lib -lnetcdf in fc.flags-ld

So you should make sure this is being set correctly in the config branch for your site.

On ARCHER, this is not needed as it is done implicitly when you load the module. ftn is a wrapper to the fortran compiler, and when you load a module it adds the required flags. You can see the full list of flags with ftn -v when compiling.

I think most netcdf installations should have the nc-config utility which can tell you which flags are needed. This should be in /path-to-netcdf/bin where ncdump etc lives. For example, on ARCHER this gives:

$ nc-config --fflags

$ nc-config --libs
-L/opt/cray/netcdf/ -lnetcdf


Last edited 3 years ago by annette (previous) (diff)

comment:45 Changed 3 years ago by markr

Excellent, I had just come to that conclusion by looking at the nci intel configs.
I will edit the inc files. However, I notice that the netcdf on ARC3 has been built so:

[earmgr@login2.arc3 ~]$ nc-config --fflags
[earmgr@login2.arc3 ~]$ nc-config --libs
-L/apps/developers/libraries/netcdf/4.4.1/1/intel-17.0.1-openmpi-2.0.2/lib -lnetcdf  

NOTE the openmpi . I have used intelmpi with GCOM and have swapped the MPI modules to match that for the build. I think there will be a problem.


comment:46 Changed 3 years ago by markr

Hello CMS,
I am a step nearer getting the UM working on ARC3 at UoLeeds?.

However, the "job submission failed" signal is causing a problem.

The job is actually in the queue awaiting slots to run but GCylc says ut has failed to submit. That happens immediately after the successful submission.

job-ID  prior   name       user         state submit/start at     queue  slots ja-task-ID 
 143724 0.00000 earmgr       qw    09/25/2017 15:55:01           24        

After the job completed I initiated the next task by changeing the state to "success" and trigger "recon".
The new hurdle (on top of polling the queues from GCylc) is the wgdos_packing lib is not visible when the recon.exe runs.

error while loading shared libraries: cannot open shared object file: No such file or directory

My job is u-ap304, I am using ~markr/BranchUM/vn10.8_uoleeds_arc3_intel_cfg


comment:47 Changed 3 years ago by markr

  • UM Version changed from 10.7 to 10.8

comment:48 Changed 3 years ago by ros

Hi Mark.
This was that cylc problem I mentioned earlier in the summer. I started investigating but had to put on back burner. I'll get back to it when I can. Cylc is having problems interacting with SGE.


comment:49 Changed 3 years ago by markr

Okay so notwithstanding Cylc-SGE interaction I will continue and prod the task handler when I see that the job has run on ARC3. Semi manual mode.

Now fo rhte recon job do I need to change these files:


comment:50 Changed 3 years ago by grenville


I think not - the cpp keys can get overriden through the rose gui. I doubt they ever will be 'though.


comment:51 Changed 3 years ago by ros

Hello CMS support,

(seems I cannot add to the ticket at the moment)

The SGE problem has been fixed by Ros (thank you!).

The problem described alongside that is that the um-recon.exe cannot find some libraries.

I concluded that is down to LD_LIBRARY_PATH.

When I look directly at those exes and use ldd it shows the 4 libraries that require the extra library path info. When I module load netcdf 2 of the "not found" go away. When I explicitly append the ${UMDIR}/shumlib??? directories to LD_LIBRARY_PATH the remaining library references are resolved.

ON discovering the problem (in the u-ap304 job.err file for recon):

IN /nobackup/earmgr/cylc-run/u-ap304/share/fcm_make/build-recon/bin

[earmgr@login2.arc3 bin]$ ldd um-recon.exe | grep "not" => not found => not found => not found => not found


[earmgr@login2.arc3 bin]$ module load netcdf
[earmgr@login2.arc3 bin]$ ldd um-recon.exe | grep "not" => not found => not found


LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${UMDIR}/shumlib/2017.06.1/uoleeds-x86-ifort-17.0.1-icc-17.0.1/openmp/lib ldd um-recon.exe | grep found

Where am I supposed to set those?

(a) in a site specific settings (for all exes) ?

(b) in a suite specific RC file (I think not) ?

© some other location I have not thought of.

Many thanks for help so far, apologies that I am taking so long but I have to fit it in between other activities (until I get a new recruit).


Last edited 3 years ago by ros (previous) (diff)

comment:52 Changed 3 years ago by ros

Hi Mark,

You need to specify the paths in the site configuration files you're adding for ARC.

See for example the Met Office files under fcm-make/meto_xc40-cce/inc and do similar for ARC.


comment:53 Changed 3 years ago by mbexgcd2


I am resuming the work started by Mark on installing the UM on ARC3. I have been attempting to replicate the same steps under my own account on puma (mbexgcd2), and I have been able to get the rose/cylc submission system working from puma to arc3.

Now I am attempting to build GCOM vn6.4, however I have encountered a problem during the build, specifically when I run:

rose stem —group=arc3_intel_build

During the extraction process (fcm-make), I receive the error:

[FAIL] config-file=/home/mbexgcd2/cylc-run/vn6.4_gcom_uoleeds_arc3_intel/work/1/fcm_make_arc3_intel_mpp/fcm-make.cfg:2
[FAIL] config-file= - puma:/home/mbexgcd2/DevWork/gcom/vn6.4_gcom_uoleeds_arc3_intel/fcm-make/gcom.cfg
[FAIL] puma:/home/mbexgcd2/DevWork/gcom/vn6.4_gcom_uoleeds_arc3_intel/fcm-make/gcom.cfg: cannot load config file
[FAIL] puma:/home/mbexgcd2/DevWork/gcom/vn6.4_gcom_uoleeds_arc3_intel/fcm-make/gcom.cfg: cannot be read
[FAIL] Permission denied (publickey,password,keyboard-interactive,hostbased).

This is similar to the failure that Mark encountered as well, although I'm not sure how to resolve it? I tried to ssh into puma from puma first as suggested, but I still get the same error relating to gcom.cfg.

Apologies for resurrecting an old ticket, I realise it's been a while since it was last updated. I can open a fresh ticket if that is preferable?

Many thanks,

comment:54 Changed 3 years ago by mbexgcd2

  • Cc C.Dearden@… added

comment:55 Changed 3 years ago by annette


So you ran ssh puma on the command line and this completes OK without any prompts? I am not sure what else to suggest.

You may get more information if you run with debugging options on:

  1. Add these lines under the [[FCM_EXTRACT]] definition in the suite.rc file:
  1. You can also try running with: rose suite-run -vv -- --debug


comment:56 Changed 3 years ago by mbexgcd2

Hi Annette,

So I can run 'ssh puma' and this completes OK without any prompts, but only if I do the following first:

exec ssh-agent $SHELL

If I just try 'ssh-add' on its own after logging in to puma, I get:
Error connecting to agent: No such file or directory

There is also a file in $HOME/.ssh called 'environment.puma', which contains output similar (but not identical to) the output from 'ssh-agent'. I'm not sure what the significance of environment.puma is, or if it could be part of the problem/solution somehow?

I also ran with the debugging option on, but it didn't produce any further output relating to the gcom.cfg error.


comment:57 Changed 3 years ago by annette


Rose/cylc won't work if you have ssh-agent running with exec in this manner.

You need to delete the environment.puma, then restart the agent by sourcing your .profile (. ~/.profile), then run ssh-add.

For reference this is documented here:


comment:58 Changed 3 years ago by mbexgcd2

Hi Annette, this worked; the FCM_EXTRACT step completed successfully. I can also see a build directory on ARC3 with gcom libraries in there, so looking good…

Thanks for the help, much appreciated,

comment:59 Changed 3 years ago by mbexgcd2


I'm looking for a simple vn10.9 suite that I can use as the basis for porting the UM over to ARC3. Would taking a copy of u-at877 be a good starting point?


comment:60 Changed 3 years ago by annette

Hi Chris,

I'd recommend you take a copy of u-aa774@70453 - this is a very simple N48 atmosphere suite at UM vn10.9. (The latest version of the suite has been upgraded to vn11.0 so use the revision number mentioned for a vn10.9 suite.)

Once this is working you can move on to a more realistic setup eg GA7. The suite you mention - u-a877 - is a copy of the UKCA training suite, so it may not be the best example. You should probably check with Luke Abraham for a recommended UKCA suite.


comment:61 Changed 3 years ago by willie

  • Status changed from pending to new

comment:62 Changed 22 months ago by ros

  • Resolution set to answered
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.