Opened 10 months ago

Closed 5 months ago

#3179 closed help (fixed)

Increasing Suite runtime and memory

Reported by: NoelClancy Owned by: pmcguire
Component: JULES Keywords: rose, cylc, FLUXNET, JULES, resources
Cc: Platform: JASMIN
UM Version:

Description

Patrick,

I have added variables to suites which have been successful.
Sometimes, if I add too may variables the suite does not request sufficient time resources and memory etc. However, I am not sure how to modify the requested time and memory for a suite.

Do you know where this is specified in u-al752 and related suites?

Noel

Change History (5)

comment:1 Changed 10 months ago by pmcguire

  • Status changed from new to accepted

comment:2 Changed 10 months ago by pmcguire

  • Keywords rose, cylc, FLUXNET, JULES, resources added
  • Platform set to JASMIN

Hi Noel:
In the u-al752 suite, in the file ~/u-al752/site/suite.rc.CEDA_JASMIN, you will see these lines:

 [[JASMIN_LOTUS]]
        inherit = None, JASMIN

        [[[directives]]]
            -m = ivybridge128G
            -q = short-serial

        [[[job]]]
            batch system = lsf

    [[JULES_CEDA_JASMIN]]
        inherit = None, JASMIN_LOTUS

        [[[directives]]]
            -W = 2:00
            -n = 1

This means that it is asking for 2 hours of Wallclock time to run the job on ONE(1) of CEDA JASMIN's LOTUS nodes in the short-serial queue. If your job fails because you need more Wallclock time, you can re-run it with a higher number of hours.

This CEDA JASMIN webpage is useful for understanding how to specify ememory requests:
https://help.jasmin.ac.uk/article/112-how-to-allocate-resources

For example, in the ~/u-al752/site/suite.rc.CEDA_JASMIN, if you want to request 15GB of memory (if that is enough) you can expand the following section so that it is:

   [[JULES_CEDA_JASMIN]]
      inherit = None, JASMIN_LOTUS

     [[[directives]]]
            -W = 2:00
            -n = 1
            -R = “rusage[mem=15000]” 
            -M = 15000 

Does this help?
Patrick

comment:3 Changed 10 months ago by NoelClancy

Thanks very much Patrick,

I'm running on MONSOON so I suppose, I can do it in a similar way.

nclancy@xcslc0:~/roses/u-bm066/site> vi suite.rc.MONSOON

JULES_MONSOON?

inherit = None, METO_XC40
[directives?]

{#- We need different directives for the shared queue #}
-q = shared
-l ncpus = 2
-l walltime = 02:00:00

I've changed the above field to "-l walltime = 03:00:00" so I will see if that works.

The error message I got was as follows:
⇒> PBS: job killed: walltime 7225 exceeded limit 7200
Terminated
2020-02-09T04:06:22Z CRITICAL - failed/TERM

7200 seconds is 2 hours, so I need slightly more walltime.

I'm re-running to see if that works and I will let you know the result.

Thanks,

Noel


comment:4 Changed 10 months ago by NoelClancy

Patrick,

I made the change, re-ran the suite and it worked.

Thanks,

Ticket Closed

comment:5 Changed 5 months ago by pmcguire

  • Resolution set to fixed
  • Status changed from accepted to closed
Note: See TracTickets for help on using tickets.