Opened 7 years ago

Closed 7 years ago

#1014 closed help (fixed)

UM-UKCA on Monsoon: ERROR: 0031-250

Reported by: dan2012 Owned by: um_support
Component: UM Model Keywords: UKCA
Cc: Platform: MONSooN
UM Version: 7.3

Description

Hello,

I am currently experiencing some errors in trying to get a version (7.3) of UM-UKCA running on monsoon. It seems that the job was terminated as it ran out of time. I have tried with previous revisions of my code and get the same error, where I was not getting it before -using the same configuration (with the same job settings and revision of code) is now taking much longer than:

/projects/ukca/dapart/xhdtf/bin/qsexecute: Executing setup

/projects/ukca/dapart/xhdtf/bin/qssetup: Job terminated normally

/projects/ukca/dapart/xhdtf/bin/qsexecute: Executing dump reconfiguration program

* RCF Executable : /projects/ukca/dapart/xhdtf/bin/qxreconf *

/projects/ukca/dapart/xhdtf/bin/qsexecute: Executing model run

* UM Executable : /projects/ukca/dapart/xhdtf/bin/xhdtf.exe *

/projects/ukca/dapart/xhdtf/bin/qsserver[525]: 3539740 Terminated ERROR: 0031-250 task 61: Terminated ERROR: 0031-250 task 0: Terminated

Any idea what this error is linked too, if not my code revisions?

Many thanks, Dan

Change History (4)

comment:1 Changed 7 years ago by willie

Hi Dan,

Looking at job xhdtd, this worked, I think, completing 2160 time steps for one month. Job xhdte fails in the same manner as xhdtf. The only difference between xhdtd and xhdte (using UMUI job difference) is in the changes made to the branch vn7.3_Nenes_Activate, in going from revision 10978 to 10991. Further changes to revision 11010 (xhdtf) have not changed the situation.

To get more information you could try Atmosphere >Section by Section > Section 13 and select DIAG_PRN. Tick the flush buffer if print fails and then change the printing frequency from 24 to every time step. You could try this with job xhdtd where the error was first introduced, and then proceed to the later revision.

I don't understand why there are only 64 processor outputs when the UMUI setup has requested 12x16.

I hope that helps.

Regards,

Willie

comment:2 Changed 7 years ago by luke

Hi Dan,

Could you also turn off the

~mdalvi/umui_jobs/hand_edits/use_64cpu.ed

hand-edit. This means that you will use 64 cores per node, and while this is useful when the queue is very full and you need to limit your node usage it will slow the job down, so turning it off should speed your job up. However, you seem to have gone from running 2160 timesteps in 2.3 hours to 902 timesteps in 3 hours in xhdte.

Other than this, nothing is jumping out at me as to why the code is slowing down so much, although I am not that familiar with the routines that you are changing. Are you running with radiative feedback on? If so, can you turn this off, since if you are inadvertently changing a MODE diagnostic which then feeds back onto the radiation scheme this could cause problems.

Thanks,

Luke

comment:3 Changed 7 years ago by dan2012

Hi Willie, Luke,

Thanks for your time on this. I have got it running (albeit slowly) by reducing the time before re-submission (to 10 days). This now provides reasonable output without producing an error.

I will work on optimizing the new scheme from now on.

You can now close this ticket. Many thanks again,
Dan

comment:4 Changed 7 years ago by ros

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.