Opened 4 years ago

Closed 3 years ago

#1800 closed help (answered)

Wallclock time exceeded MONSooN vn 8.4 UKCA release job - 1 month run

Reported by: csteadman Owned by: luke
Component: UKCA Keywords: wall clock time, processors
Cc: Platform: MONSooN
UM Version: <select version>

Description

Hello,

When running the UKCA vn8.4 release job (with minor changes) on MONSooN, my jobs often exceed the wallclock time limit of three hours. My guess is that the job takes just slightly longer than three hours to run. For example, when running December 1999, the daily output files contain actual output up through 28 December, but the last two are empty (no diagnostics in xconv), and the monthly mean files are empty. See /projects/ukca-ed/clstea/xmjsr

Would the right approach be to change the number of processors, or something else? My job xmjsr uses 12 East-West and 16 North South (User Information and Submit Method → Job Submission Method). If I should change the number of processors, what should I change it to?

Thank you,
Claudia

Change History (3)

comment:1 Changed 4 years ago by luke

  • Owner changed from um_support to luke
  • Status changed from new to accepted

Dear Claudia,

You could try increasing the NS decomposition to 24. Note that this will mean that results will not bit compare with those from a 12EWx16NS.

http://www.ukca.ac.uk/wiki/index.php/Vn8.4_GA4.0_Release_Candidate:_RC6.0#Scaling_.28MONSooN.29

Thanks,
Luke

comment:2 Changed 4 years ago by csteadman

Hi Luke,

Thanks, I've submitted a monthly run with 24 NS processors.

Thanks for the link — the plots for speedup and efficiency are
interesting. Most of my jobs are for a run length of one day — should I
be careful to switch back from 24 to 16 NS processors, when I switch
from a monthly run to a day, or is it ok to leave it at 24?

(Also, what do you recommend as a job run length for testing — do you
usually run a few hours, or a day?)

Thank you,
Claudia

comment:3 Changed 3 years ago by luke

  • Resolution set to answered
  • Status changed from accepted to closed

Hi Claudia,

I'm very sorry for not replying sooner!

For shorter runs leave it the same as for the longer runs. Another option would be to only run 20 days (rather than 30 days) with 10-day dumps, but this would not bit-compare with runs running for a month (at the same decomposition) due to the solver at this version.

For testing I sometimes run as short as 2 hours, but a day is more usual. The key point is to get through all the physical processes, including radiation.

I'll close this ticket now.

Thanks,
Luke

Note: See TracTickets for help on using tickets.