Opened 3 months ago

Closed 8 weeks ago

#3253 closed help (answered)

cylc gscan

Reported by: NoelClancy Owned by: pmcguire
Component: JULES Keywords: restart
Cc: Platform: JASMIN
UM Version:

Description

Hi Patrick,

Do you know why cylc gscan is not working now.
Also the global suite of mine, u-bt273 and u-bt558 are running through spin-up but are paused or still submitted between spin-up cycles at the moment

Is this due to scheduled maintenance or unrelated, do you know?

Noel

Attachments (1)

Capture3.PNG (23.1 KB) - added by NoelClancy 3 months ago.

Download all attachments as: .zip

Change History (15)

comment:1 Changed 3 months ago by pmcguire

Hi Noel
I don't normally use cylc gscan. I just tried it and it doesn't work for me either.

Have you tried bjobs -u nmc or bqueues? What do you see if you do that?
Patrick

comment:2 Changed 3 months ago by NoelClancy

(base) [nmc@jasmin-cylc ~]$ bjobs -u nmc
No unfinished job found
(base) [nmc@jasmin-cylc ~]$ bqueues
QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP
cpom-comet 35 Open:Active 128 - - - 260 187 73 0
rsg-general 35 Open:Active 472 - - - 31 19 12 0
rsgnrt 35 Open:Active 40 - - - 70 30 40 0
sst_cci 35 Open:Inact 192 - - - 0 0 0 0
par-multi 35 Open:Active 800 256 - - 837 384 453 0
admin_test 35 Open:Active - - - - 0 0 0 0
eustace 35 Open:Active 60 60 - - 0 0 0 0
copy 30 Open:Active - - - - 0 0 0 0
short-serial 30 Open:Active 4000 2000 - - 206935 202936 3999 0
par-single 30 Open:Active 800 256 - - 40 32 8 0
high-mem 30 Open:Active 96 48 - - 41 0 41 0
lotus_ssd 30 Open:Active 240 240 - - 0 0 0 0
lotus_gpu 30 Open:Active - - - - 0 0 0 0
workshop 30 Open:Inact 240 - - - 0 0 0 0
test 30 Open:Active 48 8 - - 0 0 0 0
long-serial 25 Open:Active 800 256 - - 67 0 67 0
(base) [nmc@jasmin-cylc ~]$

comment:3 Changed 3 months ago by NoelClancy

I know that nothing was happening because when I
cd /work/scratch/nmc/u-bt273
ls -ltr

no new files

comment:4 Changed 3 months ago by pmcguire

Hi Noel
The output that you show for your bjobs -u nmc command means that you don't have any jobs running now.

The output of the bqueues command shows you how many jobs (for everybody) are currently pending or are running, etc. How many jobs are running for your queue? If there are a lot running, then maybe the queue is still active.

Have you looked at the log files for u-bt273? What do they say?
Patrick

comment:5 Changed 3 months ago by NoelClancy

2020-04-23T11:09:48+01:00 WARNING - Message send failed, try 1 of 7: Cannot connect: https://jasmin-cylc.ceda.ac.uk:43007/put_messages: [Errno 1] _ssl.c:492: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed

retry in 5.0 seconds, timeout is 30.0

2020-04-23T11:09:53+01:00 WARNING - Message send failed, try 2 of 7: Cannot connect: https://jasmin-cylc.ceda.ac.uk:43007/put_messages: [Errno 1] _ssl.c:492: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed

retry in 5.0 seconds, timeout is 30.0

2020-04-23T11:09:58+01:00 WARNING - Message send failed, try 3 of 7: Cannot connect: https://jasmin-cylc.ceda.ac.uk:43007/put_messages: [Errno 1] _ssl.c:492: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed

retry in 5.0 seconds, timeout is 30.0

2020-04-23T11:10:03+01:00 WARNING - Message send failed, try 4 of 7: Cannot connect: https://jasmin-cylc.ceda.ac.uk:43007/put_messages: [Errno 1] _ssl.c:492: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed

retry in 5.0 seconds, timeout is 30.0

2020-04-23T11:10:09+01:00 WARNING - Message send failed, try 5 of 7: Cannot connect: https://jasmin-cylc.ceda.ac.uk:43007/put_messages: [Errno 1] _ssl.c:492: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed

retry in 5.0 seconds, timeout is 30.0

2020-04-23T11:10:14+01:00 WARNING - Message send failed, try 6 of 7: Cannot connect: https://jasmin-cylc.ceda.ac.uk:43007/put_messages: [Errno 1] _ssl.c:492: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed

retry in 5.0 seconds, timeout is 30.0

2020-04-23T11:10:19+01:00 WARNING - Message send failed, try 7 of 7: Cannot connect: https://jasmin-cylc.ceda.ac.uk:43007/put_messages: [Errno 1] _ssl.c:492: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed

comment:6 Changed 3 months ago by NoelClancy

How do you re-trigger the suite if it has just stopped with the cyle GUI?

comment:7 Changed 3 months ago by pmcguire

Hi Noel
When you do this:
cd ~/roses/u-bt273
rose sgc
Do you see anything?

If not, then the job has finished or maybe it has crashed.

If a cylc GUI window opens up when you do the rose sgc with stopped jobs, then you can right click on the task and retrigger it.
Patrick

Changed 3 months ago by NoelClancy

comment:8 Changed 3 months ago by NoelClancy

cylc GUi appears after rose sgc command

comment:9 Changed 3 months ago by NoelClancy

see attachment

Do I just Connect Now?

comment:10 Changed 3 months ago by pmcguire

Hi Noel
I can't immediately figure out what went wrong with your run from the log files. It looks like the spinup_07 completed its 20 year cycle just fine, but that something happened after that and

I am not entirely familiar with the Met Office GL7 JULES suite, and its variants. I have done some work with it and its variants, but I am more familiar with the University of Reading GL6R JULES suite, which you are not currently using.

My basic advice, given this background, and without a whole lot of investigation and recoding on my part, is to take the last dump file

/work/scratch/nmc/u-bt273/JULES-ES.1p0.vn5.4.50.CRUJRA2.TRENDYv8.365.spinup_07.dump.17200101.0.nc

and then copy the suite ~/roses/u-bt273 to another suite-number, and with this copy, figure out what you need to do change the suite so that you can run this suite so that it runs with this dump file as the startdump (AINITIAL) file. There are probably more sophisticated ways of doing things, but this is how I would do it, unless you want to start a whole new run from scratch.

The more sophisticated ways to try are given here:
https://metomi.github.io/rose/2019.01.2/html/cheat-sheet.html
But I can't guarantee that thos will work either, so it might requeire some experimentation.
You might want to make a backup copy of your output files, before you do anything.

Patrick
Patrick

comment:11 Changed 3 months ago by NoelClancy

I've retarted both suites as they were not long started.

TICKET CLOSED

comment:12 Changed 8 weeks ago by pmcguire

  • Status changed from new to accepted

comment:13 Changed 8 weeks ago by pmcguire

  • Keywords restart added
  • Platform set to JASMIN

comment:14 Changed 8 weeks ago by pmcguire

  • Resolution set to answered
  • Status changed from accepted to closed
Note: See TracTickets for help on using tickets.