Opened 3 years ago

Closed 3 years ago

#2113 closed help (fixed)

Send message connection failed

Reported by: mattjbr123 Owned by: ros
Component: Rose Keywords:
Cc: Platform: MONSooN
UM Version: 10.3

Description

Hi,

Attempting to get suite u-ak617 running on xcs-c (rather than xcm).

The only changes I've made are to change the COMPUTE_HOST variable to either xcs, xcs-c, xcslc0 (with the same problem for each) and change the cores per node to 36 in the relevant macro in suite.rc.

When a the first task submits (usually install_ancil) the status in the cylc gui stays as submitted, and only updates to running when you manually poll for changes. Once updated I get the following messages in the job.out log:

Send message: try 1 of 7 failed: connection failed

retry in 5.0 seconds, timeout is 30.000000

I suspect the tasks may be running, just failing to communicate with the cylc monitor thing (there's a name for it I can't remember) on exvmscylc.
Is there a setting I've forgotten to change, or something not quite right with xcs yet?

Cheers,
Matt

Change History (3)

comment:1 Changed 3 years ago by ros

  • Owner changed from um_support to ros
  • Status changed from new to accepted

Hi Matt,

Sorry I missed this one. If you haven't already tried to submit this suite again, please try again now. The problem you describe above was due to the pyro communications port being blocked and this has now been rectified.

Cheers,
Ros.

comment:2 Changed 3 years ago by mattjbr123

Yes - seems to be fixed, thanks.

Now having a 'um-recon not found' error which I can't fathom… Probably something obvious I've missed… Anyway I've submitted another ticket at #2122

Matt

comment:3 Changed 3 years ago by ros

  • Resolution set to fixed
  • Status changed from accepted to closed
Note: See TracTickets for help on using tickets.