Opened 10 months ago

Closed 10 months ago

Last modified 10 months ago

#3175 closed help (answered)

u-bq683 failure

Reported by: jonathan Owned by: um_support
Component: UM Model Keywords:
Cc: Platform:
UM Version:

Description

After a few days of not progressing because of slowness of pptransfer my HadGEM3 job on nexcs has failed. Supposing it might just have timed out in some way I tried to restart it unchanged but got the following error messages:

xcslc0$ pwd
/home/d01/hadsa/roses/u-bq683
xcslc0$ rose suite-restart
[FAIL] cylc restart u-bq683 # return-code=1, stderr=
[FAIL] Traceback (most recent call last):
[FAIL] File "/common/fcm/cylc-7.8.3/bin/cylc-restart", line 25, in <module>
[FAIL] main(is_restart=True)
[FAIL] File "/common/fcm/cylc-7.8.3/lib/cylc/scheduler_cli.py", line 134, in main
[FAIL] scheduler.start()
[FAIL] File "/common/fcm/cylc-7.8.3/lib/cylc/scheduler.py", line 237, in start
[FAIL] self.suite_db_mgr.restart_upgrade()
[FAIL] File "/common/fcm/cylc-7.8.3/lib/cylc/suite_db_mgr.py", line 524, in restart_upgrade
[FAIL] pri_dao.vacuum()
[FAIL] File "/common/fcm/cylc-7.8.3/lib/cylc/rundb.py", line 1031, in vacuum
[FAIL] return self.connect().execute("VACUUM")
[FAIL] sqlite3.OperationalError?: disk I/O error

Do you know what this mean?

Thanks

Jonathan

Change History (3)

comment:1 Changed 10 months ago by grenville

Jonathan

Do you see anything after typing 'cylc gscan' on NEXCS?

I don't know what trace means - please try 'rose suite-run —restart' (from '/home/d01/hadsa/roses/u-bq683')

Grenville

comment:2 Changed 10 months ago by jonathan

Dear Grenville

Thanks. No, cylc gscan shows nothing. There's a similar error from suite-run —restart:

xcslc0$ pwd
/home/d01/hadsa/roses/u-bq683
xcslc0$ rose suite-run —restart
[INFO] export CYLC_VERSION=7.8.3
[INFO] export ROSE_ORIG_HOST=xcslc0
[INFO] export ROSE_SITE=
[INFO] export ROSE_VERSION=2019.01.2
[INFO] delete: log/rose-suite-run.conf
[INFO] symlink: rose-conf/20200205T124439-restart.conf ⇐ log/rose-suite-run.conf
[INFO] delete: log/rose-suite-run.version
[INFO] symlink: rose-conf/20200205T124439-restart.version ⇐ log/rose-suite-run.version
[INFO] chdir: log/
[FAIL] cylc restart u-bq683 # return-code=1, stderr=
[FAIL] Traceback (most recent call last):
[FAIL] File "/common/fcm/cylc-7.8.3/bin/cylc-restart", line 25, in <module>
[FAIL] main(is_restart=True)
[FAIL] File "/common/fcm/cylc-7.8.3/lib/cylc/scheduler_cli.py", line 134, in main
[FAIL] scheduler.start()
[FAIL] File "/common/fcm/cylc-7.8.3/lib/cylc/scheduler.py", line 237, in start
[FAIL] self.suite_db_mgr.restart_upgrade()
[FAIL] File "/common/fcm/cylc-7.8.3/lib/cylc/suite_db_mgr.py", line 524, in restart_upgrade
[FAIL] pri_dao.vacuum()
[FAIL] File "/common/fcm/cylc-7.8.3/lib/cylc/rundb.py", line 1031, in vacuum
[FAIL] return self.connect().execute("VACUUM")
[FAIL] sqlite3.OperationalError?: disk I/O error
[FAIL] ERROR: command terminated by signal 1: ssh -oBatchMode=yes -oConnectTimeout=8 -oStrictHostKeyChecking=no -n xcslc1 env CYLC_VERSION=7.8.3 bash —login -c "'"'exec "$0" "$@"'"'" cylc restart u-bq683 —host=localhost

Cheers

Jonathan

comment:3 Changed 10 months ago by grenville

  • Resolution set to answered
  • Status changed from new to closed

see #3182

Last edited 10 months ago by ros (previous) (diff)
Note: See TracTickets for help on using tickets.