#2311 closed help (answered)

Suite failing due to STWORK routine u-ar871

Reported by: s.varma13 Owned by: jeff
Component: UM Model Keywords: STWORK
Cc: Platform: Monsoon2
UM Version: 10.8

Description

Hi I ran a suite successfully u-ar871 and then made the follwing amendments. I added a new time profile T1H, outputting every hour with no time processing and replaced TDMPMN with T1H for 12 STASH requests. I also changed the use progile for those diagnosics to UPA which has file ID pp0 for which I increased the reserved headers from 16000 to 32000. I submitted this for a test run of one month. It failed and I am getting the following error message:

? Error code: 4
? Error from routine: STWORK
? Error message: STWORK: Number of fields exceeds reserved headers for unit 11
? Error from processor: 245
? Error number: 45

Can I increase the reserved headers further? Otherwise whatt can I do? How do I work out the correct numer of headers?

Many thanks

Sunil

Change History (12)

comment:1 Changed 22 months ago by s.varma13

I also changed the reinit_step from 30 days to 1 day

comment:2 Changed 22 months ago by s.varma13

Hi,

I increased the number of headers for pp0 to 34000 and the suite did not fail at atmos_main as it did before and it currently in postproc stage. Was that the best way of dealing with this?

Also, I wonder if you could help me with a connected issue. I do not want to output files for every STASH request, only 12. Could you let me know how I should deal with that in the GUI? At the moment it outputs everything. Also could you let me know how I start to output files some time after the start dump date? e.g. dump start date is 01091988 and I would like to output from 01.12.1995. What should I put in my new T1H time profile in the istr section when my unt3 is hours. Do I need to calculate the number of hours from 01091988 to 01.12.1995? I presume there is a better way.

Many thanks

Sunil

comment:3 Changed 22 months ago by jeff

Hi Sunil

Increasing the number of headers for pp0 was the correct thing to do.

If you want to disable some diagnostics, go to the STASH requests panel and "untick" those you don't want (the Incl? column). A quick way to turn off almost every diagnostic is to click on the packages button and select "Disable all packages" and reselect the ones you want.

To use a date for the start time in T1H change iopt to "Regular intervals start/stop dates" and enter the dates. See time profile TSTEPGI for an example of using this option.

Jeff.

comment:4 Changed 22 months ago by s.varma13

Hi Jeff

Thanks for your reply and the information.

The issue with the STASH requessts seems to be that if I untick the ones I do not need, the suite fails. This did not happen in the UMUI when I could just output the diagnostics I needed. I guess the easiest thing to do is save all my chosen diagnostics to pp0 as all the other diagnostics are saved to other streams.

Is that the easiest solution?

Thanks, Sunil

comment:5 Changed 22 months ago by jeff

Hi Sunil

You should be able to turn off most of the diagnostics, although those with usage profile UPUKCA shouldn't be turned off. If you can point me to a file with the error in I might be able to see why it doesn't work, otherwise you could just put all your diagnostics in one stream.

Jeff.

comment:6 Changed 22 months ago by s.varma13

Hi Jeff

I did not know about the UPUKCA requirement - thanks for letting me know.

So I ran my suite with UPUKCA and my 12 STASH requests outputting every hour (T1H) and it worked.

I then ran the suite with new Gregorian and nudging settings with just UPUKCA STASH on and it worked. However when I added my 12 TIH diagnostics it failed again in the routine STWORK

: no. of output fields (=48001) exceeds no. of reserved PP headers for unit 11
STWORK: Error when processing diagnostic section 0, item 12, code 4

????????????????????????????????????????????????????????????????????????????????
???!!!???!!!???!!!???!!!???!!! ERROR ???!!!???!!!???!!!???!!!???!!!
? Error code: 4
? Error from routine: STWORK
? Error message: STWORK: Number of fields exceeds reserved headers for unit 11
? Error from processor: 0
? Error number: 36
????????????????????????????????????????????????????????????????????????????????:

I keep increasing the headers but it keeps failing. Now that I am using nudging should this be a problem? The non-nudged and non-Gregorian suite was fine with the number of headers being 34000. I now have it on at 100000.

Could you please let me know what you think the problem may be? Is there a limit to the size and number of diagnostics a suite can run at one time? If I split my 12 diagnostics between 2 streams rather than one, will that help?

Thanks again

Sunil

comment:7 Changed 22 months ago by jeff

  • Owner changed from um_support to jeff
  • Status changed from new to accepted

Hi Sunil

Are you still running this as u-ar871 or do you have a new suite id for this run?

Jeff.

comment:8 Changed 22 months ago by s.varma13

Hi Jeff, the new suite is u-ar907. I copied u-ar871 and have just run u-ar907 with the 12 diagnostics and UPUKCA ones. It has just failed again, an extract from the job stderr is below. All 12 worked when I had not converted the suite to nudging and 365 day calendar. Each of the 12 has time profile T1H, UPK, pp10 and UPK is only being assigned to T1H. By the way this nudged suite works if I remove the 12 and just leave the UPUKCA diagnostics.

Many thanks

Sunil

????????????????????????????????????????????????????????????????????????????????
???!!!???!!!???!!!???!!!???!!! ERROR ???!!!???!!!???!!!???!!!???!!!
Rank 221 [Tue Nov 7 13:50:44 2017] [c3-0c0s10n0] application called MPI_Abort(MPI_COMM_WORLD, 9) - process 221
? Error code: 4
? Error from routine: STWORK
? Error message: STWORK: Number of fields exceeds reserved headers for unit 13
? Error from processor: 247
? Error number: 36
????????????????????????????????????????????????????????????????????????????????

[247] exceptions: An non-exception application exit occured.
[247] exceptions: whilst in a serial region
[247] exceptions: Task had pid=53706 on host nid00616
[247] exceptions: Program is "/home/d04/suvar/cylc-run/u-ar907/share/fcm_make_um/build-atmos/bin/um-atmos.exe"
Warning in umPrintMgr: umPrintExceptionHandler : Handler Invoked
Rank 219 [Tue Nov 7 13:50:44 2017] [c3-0c0s10n0] application called MPI_Abort(MPI_COMM_WORLD, 9) - process 219

????????????????????????????????????????????????????????????????????????????????
???!!!???!!!???!!!???!!!???!!! ERROR ???!!!???!!!???!!!???!!!???!!!
? Error code: 4
? Error from routine: STWORK
? Error message: STWORK: Number of fields exceeds reserved headers for unit 13
? Error from processor: 216
? Error number: 36
????????????????????????????????????????????????????????????????????????????????

[216] exceptions: An non-exception application exit occured.
[216] exceptions: whilst in a serial region
[216] exceptions: Task had pid=53675 on host nid00616
[216] exceptions: Program is "/home/d04/suvar/cylc-run/u-ar907/share/fcm_make_um/build-atmos/bin/um-atmos.exe"
Warning in umPrintMgr: umPrintExceptionHandler : Handler Invoked
Rank 247 [Tue Nov 7 13:50:44 2017] [c3-0c0s10n0] application called MPI_Abort(MPI_COMM_WORLD, 9) - process 247
Rank 216 [Tue Nov 7 13:50:44 2017] [c3-0c0s10n0] application called MPI_Abort(MPI_COMM_WORLD, 9) - process 216

comment:9 Changed 22 months ago by jeff

Hi Sunil

If you look in rose edit at Model Output Stream pp10 (or any other stream) you will see a little hand icon next to reinit_unit, click on this and you can see it sets reinit_unit=4 (Real Months) when the configuration is set to gregorian. This is an optional configuration override and means you are using Real Months for your reinit unit and hence not having enough reserved headers. If you want to use Days for the reinit unit you will need to edit file u-ar907/app/um/opt/rose-app-gregorian.conf and either change reinit_unit=2 under [namelist:nlstcall_pp(pp10)] or delete that section altogether.

Jeff.

comment:10 Changed 22 months ago by s.varma13

Hi Jeff

Thank you - that is great. I will amend as you have suggested, rerun and let you know how I get on.

Sunil

comment:11 Changed 22 months ago by s.varma13

Hi Jeff

It worked - thank you.

Sunil

comment:12 Changed 22 months ago by jeff

  • Resolution set to answered
  • Status changed from accepted to closed
Note: See TracTickets for help on using tickets.