Opened 6 years ago

Closed 6 years ago

#1292 closed error (fixed)

difficulty submitting a job

Reported by: s1251469 Owned by: um_support
Component: UM Model Keywords: ukca
Cc: Platform: ARCHER
UM Version: 7.3

Description

Hello,

Yesterday evening I started having problem running a job that I had run earlier in the day.

The error I get is:


Calling FCM_MAIN_SCR - local…
(This may take several minutes.)

FCM_MAIN: Calling Extract …
Base extract: OK
Model extract: OK
Reconfiguration extract: OK
FCM_MAIN: Extract OK

FCM_MAIN: Submitting umuisubmit_clr …
qsub: script file:: No such file or directory
FCM_MAIN: Submit failed


I made a small change in the code but I don't see why that should mean I can't even submit the job. I think I also increased the resource time.

What can I look at to udnerstand why I have this error?

Many thanks,
Declan

Change History (8)

comment:1 Changed 6 years ago by ros

Hi Declan,

I have seen the problem on occasions before and either

1) Reprocessing and submitting fixes it

or

2) It's caused by running out of disk space on /home

In this case neither of the above appear to apply and I've taken a copy of your job and get the same problem. There is no difference between the UMUI files of the working and failed submissions other than the changes you detail above. I am currently at a loss as to what is going on so will continue to investigate. In the meantime you can workaround this by doing Save, Process & Submit in the UMUI as usual. Then on ARCHER go to the newly created umui_runs directory for the job and manually run the SUBMIT script and submit to the queue by running:

archer$ cd ~/umui_runs/xjhxf-<submitid>
archer$ ./SUBMIT
archer$ qsub umuisubmit_clr

I will let you know when I've figured out what's going wrong.

Cheers,
Ros.

comment:2 Changed 6 years ago by chollow

I just had the same thing happen to me this morning, I submitted a UM job to ARCHER yesterday and it was able to run (although it ran out of time).

When I submitted the same job today, (xjgie) I got:

Calling NDS_MAIN_SCR - local…
(This may take several minutes.)
qsub: script file:: No such file or directory
NDS_MAIN: Submit failed

comment:3 Changed 6 years ago by grenville

Please note that ARCHER is in an "at risk" maintenance session today, so that may account for the problem.

Grenville

comment:4 Changed 6 years ago by s1251469

Thanks Ros,

The workaround did get the job sent for compiling but it crashed in compile with the error:


ftn-2136 crayftn: ERROR in command line

Unable to obtain a Cray Compiling Environment License.

fcm_internal compile failed (256)


I have had this problem before and it was an issue on ARCHER. I suspect that as Grenville says, this is due to the At Risk status.

regards,
Declan

comment:5 Changed 6 years ago by grenville

Declan

It's probably worth letting ARCHER know this has happened, just in case its something they can easily fix.

Grenville

comment:6 Changed 6 years ago by ros

Hi Declan,

I've tried submitting you job again this morning and it has worked each time, so I'm hopeful this was just a problem caused by the ARCHER maintenance. Please can you try submitting again.

Cheers,
Ros.

comment:7 Changed 6 years ago by s1251469

Morning Ros,

Yes. I also had a go. It has submitted and compiled with no problems. It is in a queue now but I assume all is fine so you can close this ticket if you like.

Thanks for your help,
Declan

comment:8 Changed 6 years ago by ros

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.