Opened 2 years ago

Closed 2 years ago

#2310 closed help (fixed)

gcylc disconnects frequently from running suite

Reported by: marcus Owned by: um_support
Component: Rose/Cylc Keywords:
Cc: luke.abraham@… Platform: NEXCS
UM Version: 10.6

Description

Hi, I am running currently three suites, u-ar155, u-ar286 and u-ar292. Since about two weeks these jobs experience lengthy queue times (about 5 hours, while previously they ran almost instantly) and the gcylc window disconnects frequently.

Based on advice I received in #2296 I am resubmitting the job with rose suite-run --restart but I am concerned about the frequency of this occurring, I need to restart the job at least once a day. Colleagues at MONSooN have suggested it might be low fair share quota. Would this affect the suite running on the cycl server?

Many thanks, Marcus

Change History (4)

comment:1 Changed 2 years ago by grenville

You have suffered from an intermittent problem (you can see this in /home/d03/makoe/cylc-run/u-ar155/log/suite logs)

2017-11-03T09:12:45Z CRITICAL - Traceback (most recent call last):

File "/data/local/fcm/cylc-7.5.0/lib/cylc/scheduler.py", line 236, in start

self.run()

File "/data/local/fcm/cylc-7.5.0/lib/cylc/scheduler.py", line 1325, in run

self.process_queued_task_operations()

File "/data/local/fcm/cylc-7.5.0/lib/cylc/scheduler.py", line 1171, in process_queued_task_operations

self.suite_db_mgr.process_queued_ops()

File "/data/local/fcm/cylc-7.5.0/lib/cylc/suite_db_mgr.py", line 192, in process_queued_ops

self.pri_dao.execute_queued_items()

File "/data/local/fcm/cylc-7.5.0/lib/cylc/rundb.py", line 396, in execute_queued_items

table.get_insert_stmt(), table.insert_queue)

File "/data/local/fcm/cylc-7.5.0/lib/cylc/rundb.py", line 446, in _execute_stmt

self.conn.executemany(stmt, stmt_args_list)

OperationalError?: disk I/O error

This is a Cylc/file system issue which has caused headaches all around — for now restsarting the model with —restart is a good workaround. Developments are planned which should solve this.

comment:2 Changed 2 years ago by marcus

Ah, okay thank you Grenville. I'll do what you suggest.
Many thanks,
Marcus

comment:3 Changed 2 years ago by grenville

Marcus

Please report these errors to us & Monsoon - thanks

Grenville

comment:4 Changed 2 years ago by willie

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.