Opened 11 years ago

Closed 11 years ago

#435 closed error (fixed)

Unable to submit a UM job to Hector

Reported by: cbirch Owned by: ros
Component: UM Model Keywords:
Cc: Platform:
UM Version: 7.1

Description

I am trying to submit a job to hector (xezqa). It is a 12km resolution LAM run that I copied directly from xdrqa (user=grenvill), which should work. I haven't altered anything apart from the user-id, the notification email and the Tic code (to n02-weat).

There are no errors when I do 'check setup' and then 'process'. When I hit 'submit' it asks for my password and then a window appears that says:

Calling FCM_MAIN_SCR - local…

(This may take several minutes)

Checking remote run directory…

Then nothing happens, I have left it for an hour and nothing transfers to the queue for Hector. I have also tried job xezqa (a similar job but at 1.5km resolution) and I get the same message. I also tried running xezqa, which is the example global run in version 7.3 from user=umui. This halts in the same place but with a different message:

Calling FCM_MAIN_SCR - local…

(This may take several minutes)

FCM_MAIN: Calling extract…

Base extract:failed

FCM_MAIN: Extract failed

Tidying up directories …

I am not sure what to do because without an output error message I can't tell what is wrong. I am also new to running the UM so I apologise if I this is something stupid (!)

Change History (5)

comment:1 Changed 11 years ago by ros

Hi,

The vn7.3 job (xezpa) is failing to complete the extract because you have not set up your $HOME/.fcm file on PUMA as detailed in the FCM UM Tutorial, which we recommend that you do to familiarise yourself with the new FCM code management system. It also tells you where to go looking for error messages. When the extract fails, error messages are written to the ext.out file, which is located in your $HOME/um_extracts/<runid>/umbase (or ummodel or umrecon) directory.

The important line that you need in the $HOME/.fcm file is

inc ~um/fcm/etc/um_revisions.cfg

The vn7.1 job looks like it didn't even begin the extract, so I will investigate that further and get back to you.

Regards,
Ros.

comment:2 Changed 11 years ago by ros

  • Owner changed from um_support to ros
  • Status changed from new to accepted

Hi Cathryn,

In both of the jobs you need to change UM_ROUTDIR in the UMUI window

FCM Config → FCM Extract and Build directory

to an appropriate directory on HECToR - at the moment they are pointing to Willie and Grenville's. This is the cause of your vn7.1 job hanging, I believe. It would be nice if the UMUI caught that problem nicely - sigh!!

I also forgot to put in the URL to the UM Tutorial in my previous response. It can be found at http://puma.nerc.ac.uk/trac/UM_TUTORIAL.

In addition I note that you are not yet registered to be able to write to the UM repository or complete the tutorial so I shall do so shortly and email you your Trac login details.

Ros.

comment:3 Changed 11 years ago by cbirch

Hi Ros,

Thank you for your help so far. I wasn't aware of the UM FCM tutorial but I've worked though it this afternoon. I copied the experiment to xezwe and followed the 'setting up' and 'running jobs' parts of the tutorial.

I managed to submit the job successfully but it didn't work. The error message is:

Unable to read config file "/work/n02/n02/cbirch/xezwe/umbase/cfg/bld.cfg", abort at /work/n02/n02/hum/fcm/bin/../lib/Fcm/ConfigSystem.pm line 528
Build command started on Thu May 27 16:49:04 2010.
→Parse configuration: start
Base build: failed

(from xezwe000.xezwe.d10147.t164633.comp.leave)

I guess it's not suprising it didn't work because the bld.cfg file is actually on PUMA in:
/home/cbirch/um_extracts/xezwe/umbase/cfg

I'm not sure how to correct this.

Cheers,
Cathryn

comment:4 Changed 11 years ago by ros

Hi Cathryn,

In your /home/cbirch/um_extracts/xezwe/umbase/ext.out file there are some permission denied errors when the system tries to copy the files over to HECToR. This implies that you haven't got your ssh keys set up properly to enable successful login to HECToR.

If you run the command

ssh <hector_userid>@login.hector.ac.uk

on PUMA are you successfully logged in to HECToR without any prompt for a password or passphrase? If not please follow the instructions at: http://puma.nerc.ac.uk/trac/UM_TUTORIAL/wiki/Ros/sshAgent to set up passwordless ssh.

Any problems do let me know.

Cheers,
Ros

comment:5 Changed 11 years ago by ros

  • Resolution set to fixed
  • Status changed from accepted to closed
Note: See TracTickets for help on using tickets.