Opened 10 months ago

Closed 9 months ago

#2694 closed help (answered)

wallclock time

Reported by: ggxmy Owned by: um_support
Component: UM Model Keywords:
Cc: Platform: Monsoon2
UM Version: 11.0

Description

Hi,

NRUN for my UM vn11.0 job, u-bd805, stopped and it appears to me like it was because the wallclock time was exceeded;

Elapsed Time : 0:30:47 (1847 seconds, 103% of limit)
Parallel Time : 0:30:30 (1830 seconds, 99% of elapsed time)
Walltime Limit : 0:30:00 (1800 seconds)

However I can't find where this is set. In rose edit, the run initialisation and cycling panel shows wallclock time is set to "PT1H40M" which doesn't seem to be consistent to this. I tried grep'ing in the /home/d03/myosh/roses/u-bd805/ files and directories but couldn't find a clue. Could you help me please?

I copied the release job u-bb202 and it now appears to run OK for me. Then I modified it to enable the UKCA modal dust. I also added lots of STASH requests in section 38. So it is not surprising for me that the computation time has been increased.

Best regards,
Masaru

Change History (9)

comment:1 Changed 10 months ago by ggxmy

  • priority changed from normal to high

It looks like the wall clock time of 1800 seconds is set in the file "job" in /home/d03/myosh/cylc-run/u-bd805/log.20181207T105403Z/job/19880901T0000Z/atmos_nrun/01/ and equivalent for crun. They have this line;

#PBS -l walltime=1800

So maybe I can just change this value and trigger run for now (so I did). But I feel that is not the solution for this problem because these files are created when I submit the job. So unless I find where this is set in rose gui or rose directory (rather than cylc-run directory) and change it there, I might have to repeat this again and again. And I can't find it anywhere…

Masaru

comment:2 Changed 10 months ago by ggxmy

I grep'ed with more keywords and found that rose-suite.conf has this line;

NCRUN_CLOCK='PT30M'

Is this the one that sets walltime=1800 ?

Thanks,
Masaru

comment:3 Changed 10 months ago by ggxmy

But then how is it related to "CLOCK='PT3H'" set in another line in rose-suite.conf. Same thing for "NCRUN_RESUB='P10D'" and "RESUB='P1M'"?

Masaru

comment:4 Changed 9 months ago by dcase

Masaru,

how are you getting along with this? You have RESUB of a month, and I see that in your log file, you have advanced one month along the simulation. Does this mean that you have set the wallclock correctly now? If not, let me know, and I'll look into it this afternoon.

Dave

comment:5 Changed 9 months ago by ggxmy

Hi Dave,
My jobs are now running, although I still don't know the differences and/or relationships between NCRUN_CLOCK & NCRUN_RESUB and CLOCK & RESUB.

Masaru

comment:6 Changed 9 months ago by dcase

Well the things which are in hours are going to be walltimes (i.e. the time is the "real" time which passes to run the job). You set the CLOCK value such that, in your site/MONSooN.rc file:

        [[[job]]]
            execution time limit = {{CLOCK}}

and your calculation can complete.
Things which are of the order of a month, such as RESUB, are going to be simulation times. You are running your total simulation a month at a time I believe.

If you want to change these things for particular runs, such as the NCRUN test, you can do this in the site/MONSooN-tests.rc, and it will presumably over-write any value used when you run this particular test.

I hope that I've understood you properly and that this helps,
Dave

comment:7 Changed 9 months ago by ggxmy

I think I understand NCRUN_CLOCK and NCRUN_RESUB but I still don't understand CLOCK and RESUB. All of these are in rose-suite.conf. Thanks.

comment:8 Changed 9 months ago by dcase

RESUB is set in teh rose-suite.conf and used in the suite.rc . You can see that in the graph the tasks are run repeatedly, with RESUB controlling the frequency. E.g.

        [[[ {{RESUB}} ]]]
            graph = atmos_main => postproc => housekeeping


if you want to run more simulation, and do postproc on it, as an example.

Clock is the 'execution time limit' —- see comment 6 above

comment:9 Changed 9 months ago by ggxmy

  • Resolution set to answered
  • Status changed from new to closed

Thank you for the answer.

Note: See TracTickets for help on using tickets.