Opened 4 years ago

Closed 4 years ago

#1817 closed help (fixed)

Model crash when outputting UM-UKCA diagnostics over a regional domain

Reported by: ee10hp Owned by: um_support
Component: UM Model Keywords:
Cc: Platform: ARCHER
UM Version: 7.3

Description

Dear CMS helpdesk,

Many thanks for your help.

For my v7.3 UM-UKCA job xmhhi I have set up the STASH panel to output diagnostics at hourly resolution (T1H) over a European domain (DTHETGEU).

I have added 4 diagnostics to STASH with this time and domain profile as a first test. However the model run crashes after 29 timesteps.

The .leave file for failed job xmhhi is /home/n02/n02/ee10hp/output/xmhhi000.xmhhi.d16054.t103734.leave

I couldn't see any obvious causes of error in the .leave file but when I copy this run (xmhhh) and reduce the run time to 3 hours (9 timesteps) then it terminates normally and produces the output I'm after. I wonder, therefore, if this is a memory problem. Please can you advise on whether that seems likely and how to proceed?

The .leave file for working job xmhhh is /home/n02/n02/ee10hp/output/xmhhh000.xmhhh.d16057.t140416.leave

Ultimately, I would like a one-month simulation with 45 diagnostics output at hourly resolution over a European domain (approx 40x40 deg lon/lat on a N96 grid).

Many thanks,
Hana

Change History (6)

comment:1 Changed 4 years ago by luke

Dear Hana,

It appears to be PE86 that is causing the crash:

%PE85 OUTPUT%
  NUDGING_CALC_DIAGS: Leaving Routine
  NUDGING_MAIN: Performed nudging routines
  Leaving NUDGING_MAIN
  Before call to UKCA_MAIN1
 START OF UKCA_MAIN
 STEVE:  Just about to call UKCA_MODE_EMS_UM.
 AFTER CALL TO UKCA_EMISSION_CTL OFF UKCA_MAIN
 mype= 85 ENTERED asad_flux_put_stash
 mype= 85 LEFT asad_flux_put_stash
 END OF UKCA_MAIN
%PE86 OUTPUT%
 RWDYN:  0.,  1.33632623615974934E-5,  1.53565680748585777E-6
 FAQ:  0.99999997617027636,  0.99999998401689949,  0.99999998045048732
 DDRAIN:  0.,  5.56802608358223172E-5,  6.39857014769501197E-6
 CCRAIN:  0.,  6.58319377364063483E-4,  9.02666228724233232E-5
 error water content: imode= 1  ,ibox= 8346
 mds= 96.697469703897696,  6*0.,  2.37280080141696187E-2,  0.
 nd= 2.33783895412982819E-8 mdwat= NaN
 drydp= 1.32274365291598997E-9 wetdp= NaN
 rhopar= NaN
 wvol= NaN dvol= 3.18934733900433431E-27
%PE87 OUTPUT%
  NUDGING_CALC_DIAGS: Leaving Routine
  NUDGING_MAIN: Performed nudging routines
  Leaving NUDGING_MAIN
  Before call to UKCA_MAIN1
 START OF UKCA_MAIN
 STEVE:  Just about to call UKCA_MODE_EMS_UM.
 AFTER CALL TO UKCA_EMISSION_CTL OFF UKCA_MAIN
 mype= 87 ENTERED asad_flux_put_stash
 mype= 87 LEFT asad_flux_put_stash
 END OF UKCA_MAIN

Although looking through the output from this processor I'm not sure why it should be failing.

Have you a copy of this job which has ever run for more than 29 timesteps, or is this the first time that you have increased the run-length?

Thanks,
Luke

comment:2 Changed 4 years ago by ee10hp

Hi Luke,

Thanks for your help. I hadn't noticed those NaNs? but they've helped me locate a typo in the branch revision I was using for nitrate-extended GLOMAP. I will resubmit with my most recent branch revision and see if that sorts it.

I have other jobs that have run for longer without the new European domain STASH requests but they've run from a different branch revision so can't be directly compared.

I'll let you know how this goes.

Many thanks,
Hana

comment:3 Changed 4 years ago by ee10hp

Hi Luke,

That typo was definitely causing my original problem - now fixed!
My job xmhhg (a copy of xmhhi with branch typo fixed) now runs for 51 timesteps but crashes with the following error:

 ERROR detected in routine STWORK: stop model
 : No. of output fields (= 4097 ) exceeds no. of reserved PP headers for unit  69
 STASH    : Error processing diagnostic section  34 , item  137 , code  4
   STWORK   : NO. OF FIELDS EXCEEDS RESERVED HEADERS

xmhhg .leave file: /home/n02/n02/ee10hp/output/xmhhg000.xmhhg.d16060.t115628.leave

STASH item 34-137 is one of my new diagnostics on stream UPJ with the hourly output over a European domain and I would like to add approx. 40 more of these.

In the umui, in Post processing >> Initialisation and processing of mean and standard PP files I see that an override of 16000 has been already set for UPMEAN.

Would it be possible to solve this by increasing the override size on UPJ? What would you recommend?

Many thanks,
Hana

comment:4 Changed 4 years ago by luke

Hi Hana,

Please see UKCA tutorial 3 ('What is STASH?') here:

http://www.ukca.ac.uk/wiki/index.php/UKCA_Chemistry_and_Aerosol_Tutorial_3

specifically the solution to Task 3.1:

http://www.ukca.ac.uk/wiki/index.php/Solution_to_UKCA_Chemistry_and_Aerosol_Tutorial_3_Task_3.1

Either of the two options would work, but I prefer increasing the output frequency.

Thanks,
Luke

Last edited 4 years ago by luke (previous) (diff)

comment:5 Changed 4 years ago by ee10hp

Hi Luke,

That's fixed it!

Thank you,
Hana

comment:6 Changed 4 years ago by ros

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.