Opened 7 months ago

Last modified 7 months ago

#3508 new help

JULES run on JASMIN failing

Reported by: katieblackford Owned by: jules_support
Component: JULES Keywords: JULES, JASMIN
Cc: Platform: JASMIN
UM Version:

Description

Hi,

I am currently trying to run a JULES suite on JASMIN (u-cd695). The suite runs ok, but during the main run it occasionally fails with this error message:
"Please verify that both the operating system and the processor support Intel® X87, CMOV, MMX, FXSAVE, SSE, SSE2, SSE3, SSSE3, SSE4_1, SSE4_2 and POPCNT instructions."
I assume this error is not to do with the running of my suite as I can resubmit the job and it will run ok (but often will run into this error again), and is instead a problem with how it is running on JASMIN?

Any advice on what to do to stop this from happening would be greatly appreciated. The job is currently running in the par-single queue.

Occasionally I also have received an error relating to losing connection to the "remote host daemon" on this suite and on others. But again, the suite will continue to run if I resubmit the job.

Any ideas on how I can prevent these errors from happening?

Thanks,
Katie

Change History (6)

comment:1 Changed 7 months ago by dcase

Katie,

As a first guess, I would look at restricting yourself to intel nodes. There is a list here: https://help.jasmin.ac.uk/article/4932-lotus-cluster-specification with an example for the constraint.

An issue may be that I can't immediately see which nodes are compatible with which queue- and even if they are compatible then I wouldn't want you to have to queue for too long. So if you find that this isn't helping reply below.

A better solution may be to rebuild in a different way, but hopefully you wouldn't need to do that.

Dave

comment:2 Changed 7 months ago by dcase

FWIW, I ran a job with similar settings to yours, using the three most common intel types, and it seemed to work. I couldn't see how to use a 'not'constraint—- presumably you can??? but you can use 'or', as is done below:

#!/bin/bash -l
#
# ++++ THIS IS A CYLC TASK JOB SCRIPT ++++
# Suite: u-cd695
# Task: S3.17600101T0000+01
# Job log directory: 17600101T0000+01/S3/01
# Job submit method: slurm
# Execution time limit: 21600.0

# DIRECTIVES:
#SBATCH --job-name=test
#SBATCH --output=/home/users/dcase/job.out
#SBATCH --error=/home/users/dcase/job.err
#SBATCH --time=360:00
#SBATCH --partition=par-single
#SBATCH --ntasks=16
#SBATCH --constraint="ivybridge128G|skylake348G|broadwell256G"


pwd
echo 'something'
lscpu

comment:3 Changed 7 months ago by katieblackford

Hi Dave,

Thanks. I am unsure how I specify using the intel nodes? Do I add this into the suite.rc file? or elsewhere?

Also, I have been still trying to run this suite and have just now gotten a different error:
"None of the TCP networks specified to be included for out-of-band communications
could be found:

Value given: p4p2

Please revise the specification and try again."

Thanks,
Katie

comment:4 Changed 7 months ago by dcase

Yes, so I've not done this myself, but I guess that you can change the suite.rc to have the slurm directives of your choice. There is the [[[directives]]] section, where you have set --partition=par-single, and underneath this you should be able to add a line with --constraint="ivybridge128G|skylake348G|broadwell256G"

Hopefully it will make a job file which has the same #SBATCH ... stuff in the header that is in the example comment above.

Presumably if nothing else changes, then it should run as before, but without a chance to hit a node which it can't run on…. I hope…

comment:5 Changed 7 months ago by katieblackford

OK, thanks. I will make that change and let you know if that works.

Do you know how I can view the #SBATCH in the job file?

Thanks,
Katie

comment:6 Changed 7 months ago by dcase

I may have misunderstood, but if you change your suite.rc, and reload your suite, then all job files from this point should be generated with the SBATCH lines. If you just view the files, they should be there.

Note: See TracTickets for help on using tickets.