#3262 answered Suite won't restart um_support charlie


Sorry to bother you, but one of my suites (that has been perfectly stable till now) appears to have failed over the weekend. I have tried shutting down and restarting it, but get the error below. Can you advise on what has happened here? Might it be something to do with a storage issue (i.e. I have too much), either on NEXCS and/or JASMIN, as I haven't done any tidying/archiving for several days? Or is there a problem with the machine itself?



cwilliams@xcslc1:~/roses/u-br871> rose suite-run --restart
[INFO] export CYLC_VERSION=7.8.3
[INFO] export ROSE_ORIG_HOST=xcslc1
[INFO] export ROSE_SITE=
[INFO] export ROSE_VERSION=2019.01.2
[INFO] delete: log/rose-suite-run.conf
[INFO] symlink: rose-conf/20200512T105028-restart.conf <= log/rose-suite-run.conf
[INFO] delete: log/rose-suite-run.version
[INFO] symlink: rose-conf/20200512T105028-restart.version <= log/rose-suite-run.version
[INFO] chdir: log/
[FAIL] cylc restart u-br871 # return-code=1, stderr=
[FAIL] Traceback (most recent call last):
[FAIL]   File "/common/fcm/cylc-7.8.3/bin/cylc-restart", line 25, in <module>
[FAIL]     main(is_restart=True)
[FAIL]   File "/common/fcm/cylc-7.8.3/lib/cylc/", line 134, in main
[FAIL]     scheduler.start()
[FAIL]   File "/common/fcm/cylc-7.8.3/lib/cylc/", line 237, in start
[FAIL]     self.suite_db_mgr.restart_upgrade()
[FAIL]   File "/common/fcm/cylc-7.8.3/lib/cylc/", line 524, in restart_upgrade
[FAIL]     pri_dao.vacuum()
[FAIL]   File "/common/fcm/cylc-7.8.3/lib/cylc/", line 1031, in vacuum
[FAIL]     return self.connect().execute("VACUUM")
[FAIL] sqlite3.OperationalError: database is locked
#3261 fixed Disk quota exceeded ros s1895566

I was trying to run suite u-bu226 on Monsoon and it failed at fcm_make_um with the following error:

[FAIL] ftn -obin/um-atmos.exe o/um_main.o -L/scratch/jtmp/pbs.3171918.xcs00.x8z/RfNz1PItrN -lum-atmos -h omp -L/projects/um1/lib/cce-8.3.4/gcom/gcom-6.6/haswell/meto_xc40_cray_mpp/build/lib -lgcom -h omp -L/projects/um1/lib/cce-8.3.4/grib_api/grib_api-1.26.0/ivybridge/lib -lgrib_api_f90 -lgrib_api -L/projects/um1/lib/cce-8.3.4/shumlib/shumlib-2018.06.1/haswell/openmp/lib -lshum_wgdos_packing -lshum_string_conv -lshum_latlon_eq_grids -lshum_horizontal_field_interp -lshum_spiral_search -lshum_constants -L/opt/cray/lustre-cray_ari_s/default/lib64/ -llustreapi # rc=1
[FAIL] /opt/cray/hdf5/1.8.13/CRAY/83/lib/libhdf5.a(H5PL.o): In function `H5PL__open$$CFE_id_56395c9c_01603595':
[FAIL] /home/users/ulib/hdf5/1.8.13/rpm/BUILD/cray-hdf5-1.8.13-cce1-serial/src/H5PL.c:535: warning: Using 'dlopen' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
[FAIL] /opt/cray/cce/8.3.4/cray-binutils/x86_64-unknown-linux-gnu/bin/ld: final link failed: Disk quota exceeded
[FAIL] link      674.3 ! um-atmos.exe         <- um/src/control/top_level/um_main.F90
[FAIL] ! um-atmos.exe        : update task failed

[FAIL] fcm make -f /working/d01/edenau/cylc-run/u-bu226/work/19790101T0000Z/fcm_make_um/fcm-make.cfg -C /home/d01/edenau/cylc-run/u-bu226/share/fcm_make_um -j 6 # return-code=2
2020-05-11T22:45:13Z CRITICAL - failed/EXIT

Is it something that can be fixed if a larger quota? Thanks.

#3260 fixed No connection to ARCHER ros mtodt


I cannot submit a suite or login to ARCHER since this afternoon. The connection just times out. Is that a general problem today or specific to my account? Thanks a lot for your help!

Cheers Markus

