Opened 7 months ago

Closed 6 months ago

#3350 closed help (answered)

Ancil suite not running on new cylc1.jasmin machine

Reported by: ajd Owned by: um_support
Component: Rose/Cylc Keywords:
Cc: Platform: JASMIN
UM Version:

Description

Hello CMS,

I am trying to run a suite to generate time slice ancillaries from SSP scenarios on cylc1.jasmin.ac.uk, obtained from Alistair Sellar.

I have previously run the suite successfully on jasmin-cylc, however with the migration to slurm, tasks running on lsf - high-mem stay in the queue a very long time (>24-48 hours). Previously the suite had run in approx 30 mins. I have made a copy of the suite, u-bw996, which is exactly the same suite with only minor changes to suite.rc to run on slurm. It works fine until the 'add_orography' task where it fails with the following error: /apps/slurm/spool/slurmd/job10679632/slurm_script: line 65: CYLC_TASK_PARAM_anthropogenic_conf: unbound variable

I am at a loss regarding how to fix this as the suite previously worked on jasmin-cylc and I have only made minor changes. Apparently different versions of cylc are more or less flexible from what I could find out about this error, but a newer version of cylc is installed on cylc1 so I would have expected the bug (if it is a bug) to be fixed in the newer cylc version.

Any advice/ ideas would be appreciated!

Thanks,
Andrea


Change History (13)

comment:1 Changed 6 months ago by grenville

Hi Andrea

Sorry for the delay - what is your jasmin user name?

Grenville

comment:2 Changed 6 months ago by grenville

Please allow us read acccess to your files

comment:3 Changed 6 months ago by ajd

Hi Grenville,

Thanks for looking into this. My jasmin username is adittus and I have now done a chmod 755 on the directories, please let me know if you encounter any further issues.

Thanks!
Andrea

comment:4 Changed 6 months ago by grenville

Hi Andrea

I still can't see in:

(base) glister@cylc1$ ls -lrt adittus
ls: cannot open directory adittus: Permission denied
(base) glister@cylc1$

Grenville

comment:5 Changed 6 months ago by ajd

Hi Grenville,

Hm, odd… I've just checked the permissions again, does it work now?

Thanks,
Andrea

comment:6 Changed 6 months ago by grenville

Sadly no:

(base) glister@cylc1$ cd ~adittus
(base) glister@cylc1$ ls
ls: cannot open directory .: Permission denied
(base) glister@cylc1$

Try

chmod -R g+rX /home/users/adittus

comment:7 Changed 6 months ago by ajd

Done now, does it work?

Thanks

comment:8 Changed 6 months ago by grenville

yep -

comment:9 Changed 6 months ago by grenville

Hi Andrea

I can see what the problem is:

in /home/users/adittus/cylc-run/u-bw996/log/job/1/add_orography/02 there is no reference to CYLC_TASK_PARAM_anthropogenic_conf

contrast that with
/home/users/adittus/cylc-run/ubw996/log/job/1/cleanup_ANTHROPOGENIC_BC_biofuel/01,

where you see

export CYLC_TASK_PARAM_anthropogenic_conf="ANTHROPOGENIC_BC_biofuel"

but don't have domain knowledge to know why this parameter should or should not be exported for the add_orography task - are you in contact with the suite developer?

(Wait times are dependent on the state of the queue - I don't have any advice for how to improve throughput except to request just the amount of resource required.)

Grenville

comment:10 Changed 6 months ago by ajd

Hi Grenville,

Thanks for looking into this. Do you know where the command

export CYLC_TASK_PARAM_anthropogenic_conf="ANTHROPOGENIC_BC_biofuel"

is set in the suite config? Since I haven't changed anything except for scheduling commands in the suite, I wonder if different cylc versions handle this differently, i.e. whether that environment variable is inherited in one case but not in the other (that's what my google searches suggested at one point). Is it possible to load the old cylc version on cylc1 instead of the default to test this?

Other than that, I'm not sure what else to do. The throughput issues are only on the old lsf system (presumably because there are now no resources available on lsf) while the suite doesn't work but gets through the queue fast on slurm.

I'm in touch with Alistair so will see if he has any suggestions.

Thanks!
Andrea

comment:11 Changed 6 months ago by grenville

Andrea

Any luck with this - if not, we can delve deeper?

Grenville

comment:12 Changed 6 months ago by ajd

Hi Grenville,

Thanks for following up on this - just heard from Alistair that he managed to fix this issue. It would appear that cylc has become stricter concerning unbound variables than previous versions.

Many thanks,
Andrea

comment:13 Changed 6 months ago by ros

  • Resolution set to answered
  • Status changed from new to closed

Thanks for letting us know.

I'll close this ticket now.

Cheers,
Ros.

Note: See TracTickets for help on using tickets.