Opened 2 years ago

Closed 2 years ago

#2214 closed help (fixed)

JULES not resubmitting

Reported by: charlie Owned by: um_support
Component: JULES Keywords:
Cc: Platform: Other
UM Version: <select version>

Description

Hi,

Sorry to bother you again, but I was hoping someone could advise on why my JULES suite (submitted by Rose to JASMIN) won't resubmit using my latest start dump?

Nothing has changed in my suite since my last short run, except the start and end dates of my runtime and output and my initial conditions. These all match, with a start date of 1/7/1980 and an end date of 1/1/1981 i.e. 6 months. I have changed my initial conditions to be my most recent start dump, currently on JASMIN at
/group_workspaces/jasmin/jules/charliewilliams/tilaka.d/jules.d/runs.d/v48.d/output.d/newexp2a_riverinund-on_con.d/newexp2a_riverinund-on_con_2yrs_pt3.dump.19800701.0.nc

You can see my namelists at /home/charlie/roses/u-am232/app/jules/rose-app.conf

As I said, nothing has changed since I successfully submitted this suite yesterday, except the start/end date and the filename of the start dump.

However, it is not even getting as far as submitting, failing before it begins running. It's not generating any output error file on either JASMIN or PUMA. All I'm getting is 2 files (job and job-activity.log), neither of which tell me why its failing. All I have is a "stop with submit failed" message in Cylc.

What's the matter with it this time?

Charlie

Change History (6)

comment:1 Changed 2 years ago by grenville

Charlie

I can't see anything wrong with the job script. Can you try submitting the job script directly; on a jasmin sci machine, go to your cylc-run/u-am232/log/job/1/jules/01 directory and type

bsub < job

does that submit OK?

Grenville

comment:2 Changed 2 years ago by charlie

Dear Grenville,

I haven't yet tried that, but I think it is submitting okay because when I submit my suite from PUMA, it submits and runs the first part (fcm_make) no problem. It's when it gets to JULES that it just stops.

Charlie

comment:3 Changed 2 years ago by charlie

Hi again,

Whatever problem I was having yesterday appears to have gotten worse, because now when tried submitting my suite from PUMA, it doesn't even submit - it just hangs and doesn't do anything. As I said, yesterday it submitted okay, opening the Cylc GUI and successfully completing fcm_make before failing. Today it doesn't even do that.

I have just tried submitting it directly from JASMIN as you suggested, but I don't have a cylc-run/u-am232/log/job/1/jules/01 directory. In other words, it hasn't even got far enough to create this.

Charlie

comment:4 Changed 2 years ago by charlie

Hi again Grenville,

Right then… After approximately 5 hours, my suite has submitted. When I say submitted, I don't mean the JULES module (which often takes that long in the queue) - I mean it took 5 hours to do ANYTHING, even to load up the Cylc GUI. When it finally did, it submitted fcm_make and that ran more-or-less straight away and succeeded, as it usually does. It then began running JULES (which is further than it got yesterday), but fell over after about a minute. I have checked my error log (at cylc-run/u-am232/log/job/1/jules/01/job.err on either JASMIN or PUMA), which was at least generated this time, but it just tells me that my suite was killed by the system for unknown reasons:

Received signal ERR
cylc (scheduler - 2017-07-05T15:17:51Z): CRITICAL Task job script received signal ERR at 2017-07-05T15:17:51Z
cylc (scheduler - 2017-07-05T15:17:51Z): CRITICAL failed at 2017-07-05T15:17:51Z

From looking at my run log (at cylc-run/u-am232/log/job/1/jules/01/run.log on either JASMIN or PUMA), it appears to have got to the point of opening up my output file. Then it just stops, with no further error. Might it be a space issue? I have checked my space in my home directory on JASMIN, using pan_quota, but that's fine. Anyhow, my output isn't going to my home directory, but rather the groupworkspace - which certainly has lots of space left.

I'm wondering, given that these errors are totally inconsistent and occur seemingly randomly with no changes by me, that this is all related to problems their end? As I said, I'm almost 100% certain that there's nothing wrong with my suite, and indeed nothing has changed except the start dump and run length/dates. I have triple-checked all of these, and can't see any error.

Any thoughts? The issue with emailing the JASMIN support people is that firstly they appear to take days and days to respond, and secondly they will undoubtedly say it's a JULES problem and not their area of expertise.

Charlie

comment:5 Changed 2 years ago by charlie

Hi again,

I was just wondering if you had had a chance to look at my last message, about JASMIN running incredibly slowly? I have just tried resubmitting my suite after the weekend, and it is still just hanging - the Rose box which says "Executing function" etc has appeared, but is not doing anything.

Charlie

comment:6 Changed 2 years ago by willie

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.