Opened 7 months ago
Closed 6 months ago
#3350 closed help (answered)
Ancil suite not running on new cylc1.jasmin machine
Reported by: | ajd | Owned by: | um_support |
---|---|---|---|
Component: | Rose/Cylc | Keywords: | |
Cc: | Platform: | JASMIN | |
UM Version: |
Description
Hello CMS,
I am trying to run a suite to generate time slice ancillaries from SSP scenarios on cylc1.jasmin.ac.uk, obtained from Alistair Sellar.
I have previously run the suite successfully on jasmin-cylc, however with the migration to slurm, tasks running on lsf - high-mem stay in the queue a very long time (>24-48 hours). Previously the suite had run in approx 30 mins. I have made a copy of the suite, u-bw996, which is exactly the same suite with only minor changes to suite.rc to run on slurm. It works fine until the 'add_orography' task where it fails with the following error: /apps/slurm/spool/slurmd/job10679632/slurm_script: line 65: CYLC_TASK_PARAM_anthropogenic_conf: unbound variable
I am at a loss regarding how to fix this as the suite previously worked on jasmin-cylc and I have only made minor changes. Apparently different versions of cylc are more or less flexible from what I could find out about this error, but a newer version of cylc is installed on cylc1 so I would have expected the bug (if it is a bug) to be fixed in the newer cylc version.
Any advice/ ideas would be appreciated!
Thanks,
Andrea
Change History (13)
comment:1 Changed 6 months ago by grenville
comment:2 Changed 6 months ago by grenville
Please allow us read acccess to your files
comment:3 Changed 6 months ago by ajd
Hi Grenville,
Thanks for looking into this. My jasmin username is adittus and I have now done a chmod 755 on the directories, please let me know if you encounter any further issues.
Thanks!
Andrea
comment:4 Changed 6 months ago by grenville
Hi Andrea
I still can't see in:
(base) glister@cylc1$ ls -lrt adittus ls: cannot open directory adittus: Permission denied (base) glister@cylc1$
Grenville
comment:5 Changed 6 months ago by ajd
Hi Grenville,
Hm, odd… I've just checked the permissions again, does it work now?
Thanks,
Andrea
comment:6 Changed 6 months ago by grenville
Sadly no:
(base) glister@cylc1$ cd ~adittus (base) glister@cylc1$ ls ls: cannot open directory .: Permission denied (base) glister@cylc1$
Try
chmod -R g+rX /home/users/adittus
comment:7 Changed 6 months ago by ajd
Done now, does it work?
Thanks
comment:8 Changed 6 months ago by grenville
yep -
comment:9 Changed 6 months ago by grenville
Hi Andrea
I can see what the problem is:
in /home/users/adittus/cylc-run/u-bw996/log/job/1/add_orography/02 there is no reference to CYLC_TASK_PARAM_anthropogenic_conf
contrast that with
/home/users/adittus/cylc-run/ubw996/log/job/1/cleanup_ANTHROPOGENIC_BC_biofuel/01,
where you see
export CYLC_TASK_PARAM_anthropogenic_conf="ANTHROPOGENIC_BC_biofuel"
but don't have domain knowledge to know why this parameter should or should not be exported for the add_orography task - are you in contact with the suite developer?
(Wait times are dependent on the state of the queue - I don't have any advice for how to improve throughput except to request just the amount of resource required.)
Grenville
comment:10 Changed 6 months ago by ajd
Hi Grenville,
Thanks for looking into this. Do you know where the command
export CYLC_TASK_PARAM_anthropogenic_conf="ANTHROPOGENIC_BC_biofuel"
is set in the suite config? Since I haven't changed anything except for scheduling commands in the suite, I wonder if different cylc versions handle this differently, i.e. whether that environment variable is inherited in one case but not in the other (that's what my google searches suggested at one point). Is it possible to load the old cylc version on cylc1 instead of the default to test this?
Other than that, I'm not sure what else to do. The throughput issues are only on the old lsf system (presumably because there are now no resources available on lsf) while the suite doesn't work but gets through the queue fast on slurm.
I'm in touch with Alistair so will see if he has any suggestions.
Thanks!
Andrea
comment:11 Changed 6 months ago by grenville
Andrea
Any luck with this - if not, we can delve deeper?
Grenville
comment:12 Changed 6 months ago by ajd
Hi Grenville,
Thanks for following up on this - just heard from Alistair that he managed to fix this issue. It would appear that cylc has become stricter concerning unbound variables than previous versions.
Many thanks,
Andrea
comment:13 Changed 6 months ago by ros
- Resolution set to answered
- Status changed from new to closed
Thanks for letting us know.
I'll close this ticket now.
Cheers,
Ros.
Hi Andrea
Sorry for the delay - what is your jasmin user name?
Grenville