Opened 2 months ago

Closed 7 weeks ago

#2541 closed help (fixed)

Running nesting suite with dm_RUN_MODE=4 on ARCHER

Reported by: mbexgcd2 Owned by: um_support
Priority: normal Component: UM Model
Keywords: nesting suite Cc:
Platform: ARCHER UM Version: 10.9

Description

Hi,

I am attempting to run a version of the nesting suite on ARCHER (suite u-az250) using output from a driving model that has already been run (i.e. from start dumps and LBC creation files on disk, from an exiting LAM run). I have set the paths to 'dm_ic_file' and 'dm_lbc_files' in the 'Driving Model setup' panel in Rose, but when I come to run the job I encounter the following error from cylc:

[FAIL] cylc validate -v —strict u-az250 # return-code=1, stderr=
[FAIL] WARNING: deprecated items were automatically upgraded in 'suite definition':
….
[FAIL] ERROR, bad graph node format:
[FAIL] install_glm_lbcdata_000 ⇒ \
[FAIL] install_glm_lbcdata_001 ⇒ \
[FAIL] install_glm_lbcdata_002 ⇒ \
[FAIL] Correct format is NAME(<PARAMS>)([CYCLE-POINT-OFFSET])(:TRIGGER-TYPE)

I have run a similar suite successfully on Monsoon (u-ay877), so I am wondering whether this is an ARCHER-specific problem. I've had a look in the suite.rc file, but I can't spot anything obvious that might explain the error message. Has anyone ran the nesting suite in this manner before on ARCHER?

Thanks,
Chris.

Change History (10)

comment:1 Changed 2 months ago by willie

Hi Chris,

These suites are derived from the 2018 Nesting suite u-av356, which hasn't been ported to ARCHER yet. You could use my u-au230, the 2017 Nesting Suite UM 10.6, which has.

Regards
Willie

comment:2 Changed 2 months ago by mbexgcd2

Hi Willie,

The 2018 nesting suite (u-av356) is the one that Stu Webster recommended I use on ARCHER, so I was under the impression it had been ported already? In fact I've been able to run a copy of this suite on ARCHER using the global model to drive a 4.4km domain, and in turn, a 1 km domain. I only encounter a problem when I try to use the existing 1 km LAM output as the driving model for a 200m resolution domain.

Since I encountered this problem, Stu has upgraded u-av356 to vn11.1. I was planning to take a fresh copy of this suite to see if the problem goes away at vn11.1.

Apart from making sure I pick up the correct .rc file in the site directory, do you know if there are there any other steps I should be aware of when porting suites?

Thanks,
Chris.

comment:3 Changed 2 months ago by willie

Hi Chris,

I haven't ported u-av356 to ARCHER. I have ported the previous June 2017 Nesting Suite (u-ao230) to ARCHER and it was a fairly complex process. It's not just about the site file. You could look at what I did to u-ao230 for a rough idea of what to look at.

If you have a previous version running on ARCHER perhaps that could be modified for use?

Regards
Willie

comment:4 Changed 2 months ago by mbexgcd2

Hi Willie,

Stu Webster did the testing of u-av356 on ARCHER; details below:

https://code.metoffice.gov.uk/trac/roses-u/ticket/168#comment:7 (successfully tested vn10.9 version of u-av356 on Archer) and
https://code.metoffice.gov.uk/trac/roses-u/ticket/168#comment:27 (successfully tested vn11.1 of u-av356 on Archer (without ancil generation)

As I say, I have been able to run a copy of this suite myself (u-az251), but only when the global model is used as the driving model. Setting the value of dm_RUN_MODE=4 in the 'Driving Model setup' panel causes the suite to fail with the Cylc error message as originally reported.

I will try to see if the problem goes away when using a different UM version - I'll report back with the results when I have them.

Chris.

comment:5 Changed 2 months ago by mbexgcd2

Hi Willie,

I've created suite u-az863, which is a copy of the 2017 nesting suite u-ao230. I set 'dm_RUN_MODE'=4 ('Use start dumps and LBC creation files on disk') and encountered exactly the same error as at vn10.9.

So it seems like this issue has been around for a while, but most likely undiscovered until now? It's quite important for my work as I need to be able to nest down to sub-km resolutions, which require separate science settings compared to the parent nests.

Chris.

comment:6 Changed 7 weeks ago by mbexgcd2

Just a quick update on this - the same problem exists at vn11.1 of the nesting suite as well (see my rose suite u-ba025). It would be good to know if someone is able to reproduce the same error, just so I can rule out any issues relating to my own account / environment. As far as I can tell, the error seems to be generated by cylc, and I don't even get as far as seeing the Gcylc window appear to check the status of each indivual task/app, so it's happening very early on in the job submission process.

Chris.

comment:7 Changed 7 weeks ago by ros

Hi Chris,

Willie is on leave this week. I think the problem is a cylc version one. At the moment I can't see what's actually wrong with that part of the graph though for cylc-6.11.4 not to be happy.

I tried running your u-ay877 on Monsoon (cylc-7.7.2) but I still get a cylc failure…..

[FAIL] 2018-07-31T13:05:11Z ERROR - install_glm_lbcdata_002:succeed
[FAIL] 'Illegal graph node: \\'

Do you not get this message?

And if I run your u-ba025 on a local server here with cylc-7.7.1 on I get the same "Illegal graph node: \\" message as on Monsoon.

Ros.

comment:8 Changed 7 weeks ago by mbexgcd2

Hi Ros,

Thanks for looking into this. I also tried to re-run u-ay877 on Monsoon yesterday and got the same cylc failure as you reported above, which I wasn't expecting since I have been able to run this suite before on Monsoon.

I had a suggestion from Stu Webster to try setting FREE_RUN to FALSE in the Cycling Options rose panel. I did that this morning and I'm glad to report that this fixes the problem in both my Archer and Monsoon suites. I had it set to FALSE originally, but at some point in the last few weeks I must have turned it to TRUE. In this configuration, the start and end times for cycling need to be different, but mine were the same and cylc doesn't like this, hence the error.

Now that I understand the reason for the failure, I think it is OK to close this ticket now.

Thanks,
Chris.

comment:9 Changed 7 weeks ago by ros

Hi Chris,

Thanks for letting us know. I will close this ticket now.

Cheers,
Ros.

comment:10 Changed 7 weeks ago by ros

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.