Opened 3 years ago

Closed 3 years ago

#1982 closed help (fixed)

vn10.4 Rose suite on ARCHER: tasks are only ready or waiting

Reported by: luke Owned by: um_support
Component: Rose Keywords: GA7
Cc: Platform: ARCHER
UM Version: 10.4

Description (last modified by luke)

Hello,

I am attempting to run a vn10.4 GA7+StratTrop Rose suite on ARCHER (u-ag308). The tasks never run on the HPC though, they are just listed as ready or waiting.

I have attached a screenshot showing the state of the suite.

Am I doing something silly, or do I need to change something to get it to run?

Many thanks,
Luke

Attachments (1)

Screenshot 2016-09-21 11.56.54.png (159.8 KB) - added by luke 3 years ago.
Screenshot of gcylc showing state of u-ag308

Download all attachments as: .zip

Change History (13)

Changed 3 years ago by luke

Screenshot of gcylc showing state of u-ag308

comment:1 Changed 3 years ago by luke

  • Description modified (diff)

comment:2 Changed 3 years ago by luke

  • Description modified (diff)

comment:3 Changed 3 years ago by ros

Hi Luke,

Have you tried manually triggering the install_ancil and fcm_make2_um tasks? Sometimes the tasks can get stuck - I'm guessing communication issue with ARCHER. If that doesn't work, try shutting down the suite and then running rose suite-run --restart and see if that kicks it into action.

Cheers,
Ros.

comment:4 Changed 3 years ago by luke

Hi Ros,

I can't trigger them to run from within gcylc - those options are greyed-out. I am trying rose suite-run --restart.

Many thanks,
Luke

comment:5 Changed 3 years ago by luke

Hi Ros,

Can I check how long I'm supposed to wait? The suite seems to be behaving the same way as before.

Thanks,
Luke

comment:6 Changed 3 years ago by ros

Once the task has changed to ready it should then get submitted a few seconds later. I'll take a look.

Cheers,
Ros.

comment:7 Changed 3 years ago by ros

Hmmm. Ok, so the suite works ok for me…..

I've just had a search around your ~/cylc-run/u-ag308/logs directory and found the following in the suite/err file:

'CATEGORY not found: jobs-submit'
'COMMAND not found: jobs-submit'
ERROR: remote command terminated by signal 1

I've never seen that before so will have to investigate further unless they mean anything to you?

Cheers,
Ros.

comment:8 Changed 3 years ago by ros

Ok. I have an idea….
Can you comment out the following lines from your .profile on ARCHER:

module use $UMDIR/modules
module load cylc
module load fcm
module load rose

We stopped using modules to load Rose/Cylc/FCM because it doesn't work well with the way these packages are setup to allow swapping between different versions and these will be picking up old versions. Add the following instead:

export PATH=$PATH:$UMDIR/software/bin

Then resubmit the suite. Probably best to do a rose suite-run --new to get a totally clean slate.

Cheers,
Ros.

Last edited 3 years ago by ros (previous) (diff)

comment:9 Changed 3 years ago by luke

Hi Ros,

I was sure I'd removed those lines before I went on Leave! Sorry about this.

I've done as you suggested and sent the suite off as new.

Thanks,
Luke

comment:10 Changed 3 years ago by luke

Hi Ros,

That seems to have it. The fcm_make2_um task is now running:

Queue: 
------ 

sdb: 
                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
3951707.sdb     luke     serial   fcm_make2_  31419   1   1    --  03:00 R 00:00

Thanks,
Luke

comment:11 Changed 3 years ago by ros

That's great.

Cheers,
Ros.

comment:12 Changed 3 years ago by luke

  • Resolution set to fixed
  • Status changed from new to closed

Many thanks! I'll close this ticket.

Best wishes,
Luke

Note: See TracTickets for help on using tickets.