Opened 2 years ago

Closed 8 months ago

#2281 closed help (fixed)

rose/cylc communication error messages

Reported by: pmcguire Owned by: pmcguire
Component: Rose/Cylc Keywords: jasmin, rose/cylc
Cc: Platform: Other
UM Version:

Description

I often run suites directly on Jasmin-sci1 (instead of submitting from PUMA to run on Jasmin-sci1).
I often get error messages in stderr like the following. How can I configure my suites properly so that these communication error messages don't happen?
Thanks.

Send message: try 1 of 7 failed: Connection timeout: https://jasmin-sci1.ceda.ac.uk:43005/message/put?priority=NORMAL&message=started+at+2017-09-21T13%3A59%3A25%2B01&task_id=make_plots.1: HTTPSConnectionPool(host='jasmin-sci1.ceda.ac.uk', port=43005): Max retries exceeded with url: /message/put?priority=NORMAL&message=started+at+2017-09-21T13%3A59%3A25%2B01&task_id=make_plots.1 (Caused by ConnectTimeoutError?(<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x1852f50>, 'Connection to jasmin-sci1.ceda.ac.uk timed out. (connect timeout=30.0)'))

retry in 5.0 seconds, timeout is 30.0

Change History (16)

comment:1 Changed 2 years ago by willie

Hi Patrick,

Is it a particular suite or all suites? Let us know the suite id.

Regards
Willie

comment:2 Changed 2 years ago by pmcguire

I am not sure if I get this error message for every suite that needs to report an error or not.
But one suite that had this problem when it was reporting errors during the debug stage was u-aq202.
Patrick

comment:3 Changed 23 months ago by pmcguire

I sometimes still have problems with this.
Patrick

comment:4 Changed 22 months ago by willie

  • Resolution set to answered
  • Status changed from new to closed

comment:5 Changed 22 months ago by pmcguire

Unfortunately, I sometimes still have problems with this. Do you have any suggestions for what I should do to figure out the cause of the problem? Can you reopen the case?
Patrick

comment:6 Changed 22 months ago by ros

  • Platform set to Other
  • Resolution answered deleted
  • Status changed from closed to reopened
  • UM Version <select version> deleted

Hi Patrick,

As you are submitting direct from the JASMIN VMs this looks to be an intermittent communication issue within JASMIN domain. We unfortunately don't know what is causing this. Does this only happen with specific tasks (e.g. only the build step)? I would suggest contacting the CEDA helpdesk. Rose/Cylc is maintained on JASMIN by the CEDA/Met Office so I would hope that they would be able to help/investigate - perhaps they are aware of others that have experienced similar…

Regards,
Ros

comment:7 Changed 22 months ago by pmcguire

I haven't noticed a pattern for this happening on specific tasks. I will contact the CEDA Helpdesk about this. Thanks for your help!
Patrick

comment:8 Changed 21 months ago by willie

  • Resolution set to fixed
  • Status changed from reopened to closed

comment:9 Changed 21 months ago by pmcguire

This problem has not been fixed yet. I am currently working with Annette Osprey (CMS), Alan Iwi (CEDA), and Ag Stephens (CEDA?) on this. There is a whole SLACK discussion going on right now about it, under #rose-cylc-jasmin .

comment:10 Changed 21 months ago by pmcguire

The latest information from CEDA is that we should be using (for a virtual machine) jasmin-cylc instead of jasmin-sci*. Then those https communication errors go away. But jasmin-cylc doesn't currently support GUIs from Rose/Cylc?, and its Python setup is different than jasmin-sci*.

comment:11 Changed 21 months ago by pmcguire

  • Resolution fixed deleted
  • Status changed from closed to reopened

comment:12 Changed 16 months ago by pmcguire

The last I checked, the people at CEDA were still discussing how to resolve this issue.

comment:13 Changed 16 months ago by pmcguire

  • Owner changed from um_support to pmcguire
  • Status changed from reopened to assigned

comment:14 Changed 12 months ago by pmcguire

This issue is still unresolved.

comment:15 Changed 8 months ago by pmcguire

Ros tells me that the people at CEDA have agreed to put GUI access on jasmin-cylc.

comment:16 Changed 8 months ago by pmcguire

  • Resolution set to fixed
  • Status changed from assigned to closed

Both myself and Azin Wright are now changed over to using jasmin-cylc instead of jasmin-sci* for running Rose/Cylc?? on JASMIN. Everything works fine so far. The cylc GUI appears on jasmin-cylc, and no communication errors like above were noted. But the 'display x.pdf' command doesn't work on jasmin-cylc, so a separate session to jasmin-sci* is needed to display the PDFs. The cylc GUI from rose sgc or rose suite-run doesn't work on jasmin-sci*, though the rose edit and rosie go GUIs do work. Ag Stephens from CEDA tells me that as far as he knows nothing has been changed on jasmin-sci*.

I am closing this ticket for now. Will reopen if something happens.

Note: See TracTickets for help on using tickets.