Opened 5 months ago

Closed 4 months ago

#3429 closed help (fixed)

Dehalo sbatch error

Reported by: m.couldrey Owned by: um_support
Component: Rose/Cylc Keywords:
Cc: Platform: JASMIN
UM Version:

Description

Hi CMS

I'm trying to dehalo a set of model output using suite u-bo201 and I get stuck at the submit-failed stage of the dehalo_and_move task.

In my job-activity.log I have
[STDOUT] 'sbatch: error: Script arguments not permitted with --wrap option
[STDOUT] '

I haven't tried running this suite since the Slurm upgrade, are there compatibility issues? Or perhaps this is something else?

Cheers for any help!
Matt

Change History (7)

comment:1 Changed 5 months ago by dcase

Matt, you have to change over to slurm now.

To get you going:

batch system = slurm

-q = par-single goes to something like --partition = par-single

you can look up others here: https://help.jasmin.ac.uk/article/4891-lsf-to-slurm-quick-reference

Just change the things in the suite.rc etc to the slurm equivalent. Hope that helps.

comment:2 Changed 5 months ago by m.couldrey

Right, thanks for clarifying that. I've updated a few things in the suite.rc and the tasks are submitting ok (but haven't yet started running).
Cheers!

comment:3 Changed 4 months ago by dcase

Did you solve your slurm porting problems? If so can we shut the ticket?
If you couldn't find the magic arguments, let me know.

Dave

comment:4 Changed 4 months ago by m.couldrey

Thanks for following up, Dave.

Yes I did solve the porting issue. I haven't got the suite running yet because I get a "Permission Denied" error during the dehalo & move job
In /home/users/mpc18/cylc-run/u-bo201/log/job/1/dehalo_and_move/NN/job.err
It tells me it's unable to move the files from the data directory into the holding directory.
I've set this suite up a little differently to standard: instead of pointing it at output in my own directory, it looks in Jonathan's space (since that's where the u-bv119 data are stored) and then puts the processed output in my space. In the past I ran the suite successfully on an experiment in Jonathan's directory (his suite u-bq683), but this time it doesn't seem to be working. I do have rwx permissions in his space (i.e. I can create, move and delete files in the data directory) so it's not as simple as granting access.

Any ideas?

Thanks!

comment:5 Changed 4 months ago by dcase

Can you run the command on the command line? Or is it just in the python script?

comment:6 Changed 4 months ago by m.couldrey

Oh, I can't run it on the command line either. I successfully moved a different file before, but I didn't try the actual one. Looks like I need to get Jonathan to adjust the permissions recursively. Good catch!

comment:7 Changed 4 months ago by m.couldrey

  • Resolution set to fixed
  • Status changed from new to closed

An update on this: Jonathan (the directory owner) added g+w permissions to the directory and now the suite runs ok. I'll close this ticket. Thanks for the help!

Note: See TracTickets for help on using tickets.