Opened 6 weeks ago

Closed 4 weeks ago

#3068 closed help (fixed)

install_glm_startdata failing

Reported by: anmcr Owned by: ros
Component: UM Reconfiguration Keywords:
Cc: Platform: Monsoon2
UM Version: 11.1

Description

Hello Helpdesk,

My job is u-bk265, which is on MONSooN2.

The issue is that 'install_glm_startdata' is failing. See attachment. The run is set up to use ERA-Interim fields, and I have definately given the correct file location. The .out and .err files don't seem that helpful. I have copied the output from the .err file below. I'm sure that this is relatively simple to fix, but I have been unable to do so, so would greatly appreciate your help.

Many thanks,

Andrew

Traceback (most recent call last):

File "/home/d01/amworr/cylc-run/u-bk265/bin/install_startdata", line 173, in <module>

main()

File "/home/d01/amworr/cylc-run/u-bk265/bin/install_startdata", line 163, in main

link_to_file(srcfile, destfile)

File "/home/d01/amworr/cylc-run/u-bk265/bin/install_startdata", line 119, in link_to_file

os.symlink(src, dest)

OSError: [Errno 17] File exists
2019-11-05T18:14:23Z CRITICAL - failed/EXIT

Attachments (2)

for_ncas.PNG (104.0 KB) - added by anmcr 6 weeks ago.
Screen shot
for_willie.PNG (77.3 KB) - added by anmcr 6 weeks ago.
screen shot of failure

Download all attachments as: .zip

Change History (16)

Changed 6 weeks ago by anmcr

Screen shot

comment:1 Changed 6 weeks ago by willie

Hi Andrew,

I did a fresh run of u-bk265 and all install_glm_startdata succeeded for all three cycles. It may be a good idea to do a rose suite-run --new.

Willie

comment:2 Changed 6 weeks ago by anmcr

Hi Willie,

Thanks for the reply. I did 'rose suite-run —new' and it got past the install_glm_startdata problem, but its now complaining that the glm has not produced any LBCs. See attachment and error message below. I'm not sure why this should be the case, as I have used this model setup before.

Best wishes,

Andrew

ls: cannot access "/home/d01/amworr/cylc-run/u-bk265/share/cycle/20150101T0000Z/glm/um/*_cb*": No such file or directory
2019-11-06T15:55:56Z CRITICAL - failed/EXIT

Changed 6 weeks ago by anmcr

screen shot of failure

comment:3 Changed 5 weeks ago by ros

Hi Andrew,

It looks like there is a fault in the graph as the create LBCs task needs to run after the glm forecast has run which it currently isn't.
Is this enough information to enable you to fix this? I'm currently involved in running a training course but can try and look more later if required.

Cheers,
Ros.

comment:4 Changed 5 weeks ago by anmcr

Hi Ros,

Thanks for the reply. I'm not quite sure how to fix it. I tried 'rose suite-run —new', which didn't help. Would recompiling the executable be a solution.

I appreciate that you are busy, so this can wait till next week.

Best wishes,

Andrew

comment:5 Changed 5 weeks ago by ros

Hi Andrew,

The graph is specified in the suite.rc file and defines the order in which all the tasks are run - it's this that needs editing. I will try and look a little later otherwise on Monday.

Cheers,
Ros.

comment:6 Changed 5 weeks ago by ros

  • Owner changed from um_support to ros
  • Status changed from new to accepted

Hi Andrew,

You said you've used this setup before - can you point me to a suite that you say has worked with this setup please? I looked at u-bk259 but that only has one resolution in the nested region.

I've managed to get the suite to run manually triggering the glm forecast to run before the 4p0km createbc task; now need to figure out what's wrong with the jinja that's creating the graph.

Cheers,
Ros.

comment:7 Changed 5 weeks ago by anmcr

Hi Ros,

Thanks for looking at this. This job was initially working fine when it only had one nested domain (at 1.5 km), which was a couple of week ago. However, I added an additional domain at 4 km. I didn't connect doing this with the error, as initially it was a problem with 'install_glm_startdata'.

I do have a another run with 2 nested domains, which from memory I think is u-bo132. I will check when I am back in the office tomorrow.

Best wishes,

Andrew

comment:8 Changed 5 weeks ago by ros

Hi Andrew,

That's got it, much better than my hacked version! ;-)

There's a slight difference in the graph generation code in u-bo132.
Please copy the suite-graph-all-cycles.rc file from u-bo132 into ~/roses/u-bk265 and that will fix the dependency graph to run the glm_um_forecast_000 before the 4p0km createbc task.

Regards,
Ros.

comment:9 Changed 4 weeks ago by anmcr

Hi Ros,

Your suggestion solved the issue. Thank you very much for your help.

A kind of separate issue is that the run has lots of 'submit' failures, which require re-triggering all the time. But this is not an issue on some of my other current runs. Is there a reason for this?

Thanks again,

Andrew

comment:10 Changed 4 weeks ago by ros

Hi Andrew,

It's the annoying _mkstemp error see the log/suite/log. With the move away from exvmsrose the system does not like (and it's not necessary) logging in from the xcs to the xcs to submit a task.

In the suite.rc file change host = $(rose host-select xcs-c) in the [[HOST_HPC]] section to host = localhost and reload.

Hopefully that will fix the submitting issue.

Cheers,
Ros.

comment:11 Changed 4 weeks ago by anmcr

Hi Ros,

Thanks for looking at this again.

I can't find 'host = $(rose host-select xcs-c) in the HOST_HPC?' in the suite.rc file (/home/d01/amworr/roses/u-bk265/suite.rc). The only occurrence of HOST_HPC? in that file was:

HOST_HPC?

# Tasks that the run on HPC host or group.
[job submission?]

retry delays = 3*PT30S

Could you please clarify?

Many thanks,

Andrew

comment:12 Changed 4 weeks ago by ros

Hi Andrew,

Sorry I was looking in the processed suite.rc. It's generated from the site/monsoon-cray-xc40/suite-adds.rc file.

[[HOST_HPC]]
      [[[ environment ]]]
          UMDIR=/projects/um1
      [[[remote]]]
          host = $(rose host-select {{HPC_HOST}})     <= change this line to `host = localhost`
      ...

Cheers,
Ros.

comment:13 Changed 4 weeks ago by anmcr

Hi Ros,

Thanks for all your help with this ticket. The run now seems to be running/submitting fine, so please close this ticket.

Best wishes,

Andrew

comment:14 Changed 4 weeks ago by ros

  • Resolution set to fixed
  • Status changed from accepted to closed

Thanks for letting us know.

Regards,
Ros.

Note: See TracTickets for help on using tickets.