Opened 4 years ago
Closed 3 years ago
#1800 closed help (answered)
Wallclock time exceeded MONSooN vn 8.4 UKCA release job - 1 month run
Reported by: | csteadman | Owned by: | luke |
---|---|---|---|
Component: | UKCA | Keywords: | wall clock time, processors |
Cc: | Platform: | MONSooN | |
UM Version: | <select version> |
Description
Hello,
When running the UKCA vn8.4 release job (with minor changes) on MONSooN, my jobs often exceed the wallclock time limit of three hours. My guess is that the job takes just slightly longer than three hours to run. For example, when running December 1999, the daily output files contain actual output up through 28 December, but the last two are empty (no diagnostics in xconv), and the monthly mean files are empty. See /projects/ukca-ed/clstea/xmjsr
Would the right approach be to change the number of processors, or something else? My job xmjsr uses 12 East-West and 16 North South (User Information and Submit Method → Job Submission Method). If I should change the number of processors, what should I change it to?
Thank you,
Claudia
Change History (3)
comment:1 Changed 4 years ago by luke
- Owner changed from um_support to luke
- Status changed from new to accepted
comment:2 Changed 4 years ago by csteadman
Hi Luke,
Thanks, I've submitted a monthly run with 24 NS processors.
Thanks for the link — the plots for speedup and efficiency are
interesting. Most of my jobs are for a run length of one day — should I
be careful to switch back from 24 to 16 NS processors, when I switch
from a monthly run to a day, or is it ok to leave it at 24?
(Also, what do you recommend as a job run length for testing — do you
usually run a few hours, or a day?)
Thank you,
Claudia
comment:3 Changed 3 years ago by luke
- Resolution set to answered
- Status changed from accepted to closed
Hi Claudia,
I'm very sorry for not replying sooner!
For shorter runs leave it the same as for the longer runs. Another option would be to only run 20 days (rather than 30 days) with 10-day dumps, but this would not bit-compare with runs running for a month (at the same decomposition) due to the solver at this version.
For testing I sometimes run as short as 2 hours, but a day is more usual. The key point is to get through all the physical processes, including radiation.
I'll close this ticket now.
Thanks,
Luke
Dear Claudia,
You could try increasing the NS decomposition to 24. Note that this will mean that results will not bit compare with those from a 12EWx16NS.
http://www.ukca.ac.uk/wiki/index.php/Vn8.4_GA4.0_Release_Candidate:_RC6.0#Scaling_.28MONSooN.29
Thanks,
Luke