wiki:FAQ

Frequently Asked Questions (FAQs)

Here you can find answers to some of the most frequently asked questions.

If you have a question not answered in these pages, please contact us via the Helpdesk.

Last modified 4 years ago Last modified on 03/05/13 12:45:36

Unified Model

How do I automatically resubmit my job? 

Details about automatic resubmission are described in the article Docs/AutomaticResubmission.

How do I convert a MONSooN job to run on ARCHER? 

See the article Faq/ConvertMonsoonJobToArcher

How do I do a UM Continuation Run? 

Often when you set a model running, it only runs for a certain length of simulation time (say a year), and then halts - even when you've set the run length to be longer. This is to allow you to check everything is running OK, and in UM parlance is known as an initial or new run (NRUN).

Once done, you can now proceed with a continuation run (CRUN), and the model should now start from the end of the first run to completion. Instructions for submitting the continuation run can be found here. This page also contains information on setting up the initial run and resubmission period here.

How do I get initial data for the UM? 

See Getting Initial Data

How do I use UM automatic archiving to /nerc on ARCHER? 

See the article Archer/NercArchiving

What versions of the UM are supported by NCAS-CMS? 

Version ARCHER MONSooN Polaris
UMUI 6.1 X
6.6.3 X X
6.6.6 X
7.3 X X X
7.8 X
8.2 X X
8.4 X X X
8.5 X X
8.6 X X
Rose 10.1 X X
10.2 X X
10.3 X X
10.4 X X
10.5 X X
10.6 X X
10.7 X X
10.8 X X

Note: Not all versions are available on every platform.

ARCHER

How do I get an ARCHER User Account? 

Please email Grenville, g.m.s.lister @ reading.ac.uk , to request a ARCHER account.

How is the disk space organised on ARCHER? 

There are currently two disk space areas on ARCHER.

  • HOME space
       /home/n02/n02/<userid>
    

This disk space is backed up by ARCHER so NCAS has limited space, an NCAS user allocation is therefore relatively small.

  • WORK space
       /work/n02/n02/<userid>
    

This disk space is NOT backed up and is the only disk space accessible by ARCHER batch jobs. NCAS has a large allocation and so an NCAS user allocation can be relatively large, up to a few Tbytes.

Important:
There is no specific scratch filesystem, and /tmp is not accessible by users from jobs running on the compute nodes. Thus users who desire a temporary scratch directory should set the environment variable $TMPDIR to point to a valid directory within their account on the work filesystem.

PUMA

How do I restart my ssh agent? 

Normally your ssh agent persists even when you log out of puma. However, from time to time it can vanish.

If you are prompted for your passphrase, this means the ssh agent has stopped for some reason. The agent should have been re-initialised when you logged into puma, but you will need to re-associate your ssh keys to the agent.

To do so, run the following command:

puma$ ssh-add

How do I tidy up old ssh keys? 

If you have forgotten your passphrase you will need to regenerate your ssh keys. Before doing so, you will need to tidy up the old keys otherwise the ssh agent can get itself confused.

Go to your ~/.ssh directory, and look at the files, you should see something like:

puma$ ls ~/.ssh
environment.puma  id_rsa  id_rsa.pub  known_hosts  ssh-setup

Delete the public and private keys. These will normally be named id_rsa and id_rsa.pub, or id_dsa and id_dsa.pub.

You should also delete the environment.puma file.

Next check if you have an agent running:

puma$ ps -flu <puma-username> | grep ssh-agent

If an agent is running, one or more lines like the following will be returned:

15658 ?        00:00:00 ssh-agent

The number in the first column is the process-id, pass this to the kill command to stop the process, for example:

puma$ kill -9 15658

What files can I safely delete on PUMA? 

There is limited disk space available on PUMA, so all users need to clear up files in their home directories regularly to stay within their quota.

  • Extract directories:

Submission of UM jobs via the UMUI and UM suites through Rose/Cylc, generate lots of files which mount up quickly and can be safely deleted as follows:

For the UMUI, job extract directories usually under ~/um/um_extracts or ~/FCM_extracts can be deleted. They will be recreated as required.

For Rose suites the extract directory ~/cylc-run/<suite-id>/share/fcm_make_um/.fcm_make can be deleted safely. If you are running a coupled suite there will also be a corresponding extract directory for the NEMO build (e.g. ~/cylc-run/<suite-id>/share/fcm_make_ocean/.fcm_make)

  • Working copies of branches:

Once all local changes to a branch have been committed you should delete the working copy and reference the branch in your job/suite by its repository URL. (e.g fcm:um.xm-br/dev/marcstringer/vn10.6_fix_OpenMP_nice_use)

`ssh-add` gives error message: `Could not open a connection to your authentication agent.` 

This is because your ssh-agent has stopped running for some reason and is unable to restart automatically. Try removing the following file on PUMA:

puma$ rm ~/.ssh/environment.puma

Then log out of PUMA and back in again. You should then see a message similar to:

Initialising new SSH agent...

And you should then be able to run ssh-add successfully.