Opened 12 days ago

Last modified 12 days ago

#3100 accepted help

Post-processing error running UKESM on MONSOON during 5-month PI control run

Reported by: gmann Owned by: ros
Component: UKESM Keywords: MOOSE
Cc: earfw Platform: Monsoon2
UM Version: 11.2

Description

Dear NCAS-CMS helpdesk,

Wuhu Feng is getting a problem with post-processing on MONSOON when running the copy of the UKESM Pre-Industrial control.

He has successfully run a 20-year copy of the UKESM Pre-Industrial control on MONSOON (bc694) — that is his stuite bp061 — and that works fine and archives to MASS OK.

We have now progressed to running 5-month step through from 1st Jan 1991 to end-May 1991, so that we can then run the UKESM volc-pinatubo ensemble for VolMIP.

To do that, Wuhu has made 3 changes — that is his bp286

1) start time to 1st Jan 1991 and run-length to 5 months.

2) 5 restart dumps changed to selected year's initial conditions from the aw310 Pre-Industrial control (to give the required ENSO and NAO phase during the 1st post-eruption winter).

3) post-processing—> atmosphere —> archiving dump frequency to monthly from yearly

4) recycling-period (resubmission pattern) from 3-month in bc694 to 1-month

(the 4th one he only did in some runs and change it back).

The 5-month job job runs the 1st month OK and part-way through the 2nd month and then fails with error message:

[WARN] file:atmospp.nl: skip missing optional source: namelist:archer_arch
[WARN] file:nemocicepp.nl: skip missing optional source: namelist:archer_arch
[WARN] file:pptransfer.nl: skip missing optional source: namelist:archer_arch
[WARN] file:pptransfer.nl: skip missing optional source: namelist:pptransfer
[WARN] file:atmospp.nl: skip missing optional source: namelist:script_arch
[WARN] file:nemocicepp.nl: skip missing optional source: namelist:script_arch
[FAIL] check_directory: Exiting - Directory does not exist: /home/d03/wufen/cylc-run/u-bp286/work/19910301T0000Z/coupled
[FAIL] Terminating PostProc?
[FAIL] main_pp.py atmos # return-code=1
2019-11-27T19:58:51Z CRITICAL - failed/EXIT
~

This error message has happened on several tries of re-submitting so it is not a one-off sporadic failure, it's a consistent error that seems to be happening in this particular suite configuration.

Wuhu has sent me the file-path to the log files (see below) and is also puzzled by the way the model seems to only store some of these files during the period it is running for — it seems semi-random which directories are retained within that bp286/work/ directory.

Please can you advise what the problem is here, and what you think the problem is that is causing this post-processing error.

This is quite urgent, because these runs are for the CMIP6 VolMIP submission of UKESM, and we're nearly ready to submit the 27-member ensemble for the volc-pinatubo run, but these frustrating problems with post-processing are delaying us making progress.

Cheers
Graham

wufen@xcslc0:~/cylc-run/u-bp286/log/job/19910301T0000Z> ls -lrt /home/d03/wufen/cylc-run/u-bp286/work/
total 24
drwxr-xr-x 2 wufen mo_users 4096 Nov 27 16:22 19910201T0000Z
drwxr-xr-x 9 wufen mo_users 4096 Nov 27 19:49 19910301T0000Z
drwxr-xr-x 5 wufen mo_users 4096 Nov 27 20:08 19910111T0000Z
drwxr-xr-x 36 wufen mo_users 4096 Nov 27 20:08 19910101T0000Z
drwxr-xr-x 8 wufen mo_users 4096 Nov 27 20:08 19910106T0000Z
drwxr-xr-x 8 wufen mo_users 4096 Nov 27 20:08 19910116T0000Z


National Centre for Atmospheric Science
School of Earth and Environment, University of Leeds, Leeds, LS2 9JT
Tel: +44 113 343 3438
http://homepages.see.leeds.ac.uk/~earfw/

Change History (1)

comment:1 Changed 12 days ago by ros

  • Owner changed from um_support to ros
  • Status changed from new to accepted

Hi Graham, Wuhu,

For cycle 19910301T0000Z the coupled model hasn't even run so that is why postproc is failing. From the log files I can see a mismash of cycles including some that are only 5 days long (e.g. 19910101T0000Z/ 19910106T0000Z/ 19910111T0000Z/ 19910116T0000Z/ 19910301T0000Z/) I would first of all suggest doing a clean run (rose suite-run —new) to remove any old files left over from previous attempts to run this suite. If it then fails again we will be better placed to see what is going on.

Regards,
Ros.

Note: See TracTickets for help on using tickets.