#2547 closed help (fixed)

Several GA7.1 configuration problems on NEXCS

Reported by: cbellisario Owned by: ros
Component: UM Model Keywords: NEXCS GA7.1 configuration
Cc: Platform: NEXCS
UM Version: 11.1

Description (last modified by ros)

Dear all,

Here a few questions

I am trying to run the GA7.1 model on NEXCS (suite u-az658).
On http://cms.ncas.ac.uk/wiki/UM/Configurations/ there are no standard suite associated with GA7.1. I took a standard suite GA7.0 (u-ax053@75199) and modified the suite-conf/Model configuration/Science configuration to GA7.1, hoping this works, using the updated UM (vn11.1), SOCRATES (um11.1) and JULES (um11.1, vn5.2).

This standard suite is defined for ARCHER. Therefore, I have operated changes in the suite conf/ Host Machine/Met? Office/Monsoon? parts to match a run on NEXCS.

However, this is not working. When I only try only to build and reconfigure (tasks/ Build UM → true and Run Reconfiguration → true), I get the following error:

RUN_MAIN/recon:
''[INFO] command: um-recon
[WARN] UM version (VN=x.y) defined in the environment.
[INFO] Overriding $VN to 11.0
[WARN] Using default STASHmaster as none provided "/projects/um1/vn11.0/ctldata/STASHmaster".
[WARN] Using default STASH2CF as none provided "/projects/um1/vn11.0/ctldata/STASH2CF/STASH_to_CF.txt".
[INFO] Using executable: /home/d04/chrbe/cylc-run/u-az658/share/fcm_make_um/build-recon/bin/um-recon.exe
[INFO] Using script: /home/d04/chrbe/cylc-run/u-az658/share/fcm_make_um/build-recon/bin/um-recon
[INFO] exec /opt/cray/alps/5.2.4-2.0502.9822.32.1.ari/bin/aprun -ss -n 24 -N 32 -S 16 -d 1 -j 1 /common/fcm/rose-2018.06.0/bin/rose-mpi-launch --verbose --inner /home/d04/chrbe/cylc-run/u-az658/share/fcm_make_um/build-recon/bin/um-recon.exe
Could not find PE0 output file: pe_output/az658.fort6.pe00''

I have also been trying to set up the postproc to Jasmin as settle in this page
http://cms.ncas.ac.uk/wiki/Docs/PostProcessingAppNexcsSetup
However, after the POSTPROC changes, I cannot build and reconfigure anymore the suite with the following error:

[…] *(6.4.0) [runtime][MONSOON_NOT_SUPPORTED] […]

So I would like to know if you could provide me a working GA7.1 suite on NEXCS as I am trying to fix configuration problems that I think are too big for me.

I would like also to be able to set up ssh-key to connect to JASMIN as said in the previous link.

As a final comment, on http://cms.ncas.ac.uk/wiki/MonsoonSshAgent
part ‘Setting up your public key’,
ps -flu | grep ssh-agent
should be
ps –flu ‹userid› | grep ssh-agent

Thank you a lot for your help. I set up a high priority as I would like the model to run as soon as possible (before the end of the week) since my contract comes to an end by the end of the year.
With best regards,

Christophe

Change History (9)

comment:1 Changed 13 months ago by ros

  • Owner changed from um_support to ros
  • Status changed from new to accepted

comment:2 Changed 13 months ago by ros

Hi Christophe,

GA7.0 and GA7.1 suites are identical apart from a one switch change so you will not find a separate list of GA7.1 standard suites anywhere. All the ARCHER suites are copied from Met Office standard suites so for running on Monsoon/NEXCS you should always take a Met Office/Monsoon suite. Changes required to switch between these are minimal.

As detailed on the CMS configurations page you noted, the Met Office GA7.0 UM11.0 suite is u-av674. The GA7.0/GA7.1 UM11.1 suite will be u-az257, but it is still being updated so I wouldn't take that one just yet.

The only things you'll need to change to run u-av674 on NEXCS are:

  • suite conf → Host Machine: Select Monsoon
  • suite conf → Tasks: Switch off Run Development Tests and Supermeans. Switch on PP Transfer if you want it.
  • suite conf → Science Configuration: Change Science Configuration to GA7.1

The standard suites already have the postproc/pptransfer app for NEXCS added so all you should need to do is configure the following:

  • postproc → Post Processing - common settings: Set archive_command to Nexcs
  • postproc → Post Processing - common settings → Archer Archiving: Set archive_root_path to the directory you wish to archive to. This would be /projects/nexcs-n02/<yourusername>
  • postproc → JASMIN Transfer set:
    • transfer_dir - Enter the directory on JASMIN to transfer to. This should be one of the group workspaces.
    • remote_host - This is the host to which you are pushing data to. Either jasmin-xfer1.ceda.ac.uk or if you have access to the High-prefomance transfer machine, jasmin-xfer2.ceda.ac.uk.

I will send you instructions by email on how to setup your ssh connection from NEXCS to JASMIN.

The MonsoonSshAgent instructions are only relevant for submitting UMUI UM versions, i.e pre-UM9.0. I will fix the rendering issue you noted though - thanks.

Regards,
Ros

comment:3 Changed 13 months ago by cbellisario

Dear Ros,

Thank you for your help. Indeed, the suite u-av674 was built and reconfigured successfully after these changes.

I will let know in a brief delay if the transfer to JASMIN is working properly.

Thanks again for your help,

Best regards,

Christophe

comment:4 Changed 13 months ago by cbellisario

Dear Ros,

I finally got the access to the working space and I did performed the changed you indicated.
However, even if the suite (u-az687) is working properly, the transfer of the output to JASMIN is not working.

I turned ON (true) the task (suite-conf) Post-Processing (since PP transfer appears to be working on Post-Processing) and the suite does not work anymore, with the following error message:

RosePopenError?: bash -ec H=$(rose\ host-select\ linux);\ echo\ $H # return-code=1, stderr=
[WARN] linux: (ssh failed)
[FAIL] No hosts selected.

I did check that ssh jasmin-xfer1.ceda.ac.uk is working fine.

Thank you for your help.

Best regards,

Christophe

comment:5 Changed 13 months ago by ros

Hi Christophe,

In suite conf → Host Machine try changing Extract Host to be '$ROSE_ORIG_HOST'

Regards,
Ros.

comment:6 Changed 13 months ago by cbellisario

Dear Ros,

Thank you for your help, it works for what was asked.
I had then other troubles, such as pptransfer fail (solved by a > chmod 600 ~/.ssh/id_rsa_jasmin).

And now the suite is working, producing output, but my last (I think) problem is that the transfer is only performed for 1 file (az687a.pm1988sep.pp).

I have been checking if it was about a specific case selected but I could not figure out what is wrong.

Thank you in advance for your help,

Best regards,

Christophe

comment:7 Changed 13 months ago by ros

Hi Christophe,

You have only run for 1 month so there will be very little available to archive. Files are only archived when they are no longer needed (e.g. to restart the run, to calculate any means requested, etc). If you continue the run it will start archiving more.

Regards,
Ros.

comment:8 Changed 13 months ago by cbellisario

Dear Ros,

Thank you for the precision. I was a bit concerned to see multiple files (~20) in the suite output on exvmsrose (such as az687a.pa[…], az687a.pb[…], az687a.pc[…], ) but only one transferred to JASMIN.
I will have a look on what is kept and saved on the file.

Thank you for your help, I think you can close the ticket.

Best regards,

Christophe

comment:9 Changed 12 months ago by ros

  • Description modified (diff)
  • Resolution set to fixed
  • Status changed from accepted to closed
Note: See TracTickets for help on using tickets.