Opened 15 months ago

Last modified 2 months ago

#2281 assigned help

rose/cylc communication error messages

Reported by: pmcguire Owned by: pmcguire
Priority: normal Component: Rose/Cylc
Keywords: jasmin, rose/cylc Cc:
Platform: Other UM Version:

Description

I often run suites directly on Jasmin-sci1 (instead of submitting from PUMA to run on Jasmin-sci1).
I often get error messages in stderr like the following. How can I configure my suites properly so that these communication error messages don't happen?
Thanks.

Send message: try 1 of 7 failed: Connection timeout: https://jasmin-sci1.ceda.ac.uk:43005/message/put?priority=NORMAL&message=started+at+2017-09-21T13%3A59%3A25%2B01&task_id=make_plots.1: HTTPSConnectionPool(host='jasmin-sci1.ceda.ac.uk', port=43005): Max retries exceeded with url: /message/put?priority=NORMAL&message=started+at+2017-09-21T13%3A59%3A25%2B01&task_id=make_plots.1 (Caused by ConnectTimeoutError?(<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x1852f50>, 'Connection to jasmin-sci1.ceda.ac.uk timed out. (connect timeout=30.0)'))

retry in 5.0 seconds, timeout is 30.0

Change History (14)

comment:1 Changed 15 months ago by willie

Hi Patrick,

Is it a particular suite or all suites? Let us know the suite id.

Regards
Willie

comment:2 Changed 15 months ago by pmcguire

I am not sure if I get this error message for every suite that needs to report an error or not.
But one suite that had this problem when it was reporting errors during the debug stage was u-aq202.
Patrick

comment:3 Changed 12 months ago by pmcguire

I sometimes still have problems with this.
Patrick

comment:4 Changed 12 months ago by willie

  • Resolution set to answered
  • Status changed from new to closed

comment:5 Changed 12 months ago by pmcguire

Unfortunately, I sometimes still have problems with this. Do you have any suggestions for what I should do to figure out the cause of the problem? Can you reopen the case?
Patrick

comment:6 Changed 11 months ago by ros

  • Platform set to Other
  • Resolution answered deleted
  • Status changed from closed to reopened
  • UM Version <select version> deleted

Hi Patrick,

As you are submitting direct from the JASMIN VMs this looks to be an intermittent communication issue within JASMIN domain. We unfortunately don't know what is causing this. Does this only happen with specific tasks (e.g. only the build step)? I would suggest contacting the CEDA helpdesk. Rose/Cylc is maintained on JASMIN by the CEDA/Met Office so I would hope that they would be able to help/investigate - perhaps they are aware of others that have experienced similar…

Regards,
Ros

comment:7 Changed 11 months ago by pmcguire

I haven't noticed a pattern for this happening on specific tasks. I will contact the CEDA Helpdesk about this. Thanks for your help!
Patrick

comment:8 Changed 10 months ago by willie

  • Resolution set to fixed
  • Status changed from reopened to closed

comment:9 Changed 10 months ago by pmcguire

This problem has not been fixed yet. I am currently working with Annette Osprey (CMS), Alan Iwi (CEDA), and Ag Stephens (CEDA?) on this. There is a whole SLACK discussion going on right now about it, under #rose-cylc-jasmin .

comment:10 Changed 10 months ago by pmcguire

The latest information from CEDA is that we should be using (for a virtual machine) jasmin-cylc instead of jasmin-sci*. Then those https communication errors go away. But jasmin-cylc doesn't currently support GUIs from Rose/Cylc?, and its Python setup is different than jasmin-sci*.

comment:11 Changed 10 months ago by pmcguire

  • Resolution fixed deleted
  • Status changed from closed to reopened

comment:12 Changed 6 months ago by pmcguire

The last I checked, the people at CEDA were still discussing how to resolve this issue.

comment:13 Changed 6 months ago by pmcguire

  • Owner changed from um_support to pmcguire
  • Status changed from reopened to assigned

comment:14 Changed 2 months ago by pmcguire

This issue is still unresolved.

Note: See TracTickets for help on using tickets.