Opened 5 months ago

Closed 5 months ago

#3377 closed help (fixed)

MPI library JULES

Reported by: epinnington Owned by: jules_support
Component: JULES Keywords: JULES, MPI
Cc: Platform: JASMIN
UM Version:

Description

Hi Patrick,

Thanks a lot for this, was very helpful! Do you know when they might get the MPI stuff sorted so that we will be able to do the gridded parallel JULES runs again? Seems like /usr/local/bin/mpirun.lotus has disappeared!

Thanks again!
Ewan

Sent: 16 September 2020 10:58
To: jules-users@… <jules-users@…>
Subject: Re: [Jules-users] update to 2 JULES-on-JASMIN tutorials


Hi Jules-users
I'd also like to emphasize that this JASMIN setup is not just for running rose/cylc.
The Rose/Cylc was just part of it.
But there are other parts, which are more general, like setting up the new host names in your .ssh/config file, and also setting up MOSRS password caching on JASMIN.
Patrick

On Wed, Sep 16, 2020 at 9:17 AM Patrick McGuire wrote:
Dear JULES users:

Since the hostnames are changing right now on JASMIN, I have updated a primary tutorial (originally from Kerry Day at the Met Office) for how to log into JASMIN and to use the cylc1.jasmin server to submit Rose/Cylc jobs:
https://code.metoffice.gov.uk/trac/jules/wiki/RoseJULESonJASMIN
(MOSRS password needed)

This includes my suggested changes to your local machine's .ssh/config file, as well as changes to the .bashrc and .bash_profile on JASMIN. I suggest now some wildcards in the .ssh/config file, which is different than before. Also, previously, jasmin-xfer1 was used as a proxy instead of the more standard jasmin-login1, and now this has changed to login1.jasmin, etc. This in itself may be worth looking at and updating in your own setups, otherwise a lot of JULES users will have to figure out how to change this themselves.

I have also changed the JULES FLUXNET (on JASMIN) tutorial, which uses Kerry's RoseJULESonJASMIN tutorial to get started.
https://research.reading.ac.uk/landsurfaceprocesses/software-examples/tutorial-rose-cylc-jules-on-jasmin/
This file in the u-al752 suite has been updated from LSF to SLURM directives:
~/roses/u-al752/site/suite.rc.CEDA_JASMIN
The MPI shared libraries for JULES still need updating.

This is still work in progress, so the tutorials will be changed further. If you see something that is wrong or needs improving, let me know.

Patrick McGuire

Change History (6)

comment:1 Changed 5 months ago by pmcguire

Hi Patrick,

I just wondered if you had managed to build JULES without MPI turned on, on the new system? I am struggling with this currently!

Thanks,
Ewan

comment:2 Changed 5 months ago by pmcguire

  • Reporter changed from pmcguire to epinnington

comment:3 Changed 5 months ago by pmcguire

Hi Ewan:
I was able to get JULES built and running with MPI turned on and with updated NETCDF libraries, with the GL7 gridded suite for SLURM. I updated that suite. (This SLURM version of the u-bb316 GL7 suite was developed by a couple of other people; I had to modify it further with input from one of those people).
The SLURM version is u-bx723.
Does that help?
Can I create a ticket at the NCAS CMS Helpdesk for this issue of yours?
Patrick

comment:4 Changed 5 months ago by pmcguire

Hi Patrick,

Ahh great, I will check this suite out and give it a go.

Thanks very much for the help Patrick!

Cheers,
Ewan

comment:5 Changed 5 months ago by pmcguire

Hi Patrick,

I have got it all working based off of your suite now, so thank you very much for that! One thing I found was the JULES runs were still failing due to an error with the MPI every now and again. Setting a flag of "—exclusive=user" in the JULES [[[directives]]] in suite.rc seems to have fixed this, not sure if that will help in your runs at all?

Thanks again!
Ewan

comment:6 Changed 5 months ago by grenville

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.