Opened 5 years ago

Closed 5 years ago

#1555 closed help (answered)

umsetvars related error

Reported by: Leighton_Regayre Owned by: um_support
Component: UM Model Keywords:
Cc: Platform: PUMA
UM Version: <select version>

Description

I'm having difficulty reconfiguring an existing job (xlkja) that I've copied (from tdwpm).

The error in the .rcf.leave file suggests that files expected in the /tmp/ directory cannot be located. I've checked the compile file and seen that the /tmp/ folder is created remotely (using umsetvars from my PUMA .profile script I think).

My PUMA .profile is near-identical to an experienced ARCHER user (Graham Mann).

The name of the files in my folder:
/work/n02/n02/lre/tmp/tmp.eslogin007*
do not match what is expected by the reconfiguration job:
/work/n02/n02/lre/tmp/tmp.mom4.17874/xlkja.servout

I can't see anything similar in a search of existing tickets.

Thanks,

Leighton.

Change History (5)

comment:1 Changed 5 years ago by grenville

Leighton

The reconfiguration worked OK - it created a start file /work/n02/n02/lre/um/restart/xlazia.da20071201_00 which looks OK and the leave file says

End of rcf program reached. PE 0,

which is a good sign.

I can't see why the model didn't start - what happens if you

cd /home/n02/n02/lre/umui_runs/xlkja-126154513
qsub umuisubmit_run

?

Grenville

comment:2 Changed 5 years ago by Leighton_Regayre

Hi Grenville,

The job reports a qsserver error.

In the .leave file:
/work/n02/n02/lre/um/xlkja/bin/qsserver[196]: : cannot open
/work/n02/n02/lre/um/xlkja/bin/qsserver[197]: : cannot open
/work/n02/n02/lre/um/xlkja/bin/qsserver[198]: : cannot open

Leighton.

comment:3 Changed 5 years ago by grenville

Leighton

Masaru has seen these issues (and is grappling with them) - can you contact him?

Grenville

comment:4 Changed 5 years ago by Leighton_Regayre

Grenville,

I've attempted to follow the advice Masaru gave in resetting my keys:

Set up ssh keys
To allow the UM to archive data to the /nerc disk on Archer the following commands need to be done once on Archer
mkdir -p ~/.ssh
cd ~/.ssh
ssh-keygen -f um_arch
cat um_arch.pub >> authorized_keys

As passphraseless access is required, in response to the request for a passphrase from ssh-keygen hit enter twice to generate an empty passphrase.
then redo the
ssh -v -i $HOME/.ssh/um_arch espp1
command.

I can now no longer see the queue or submit jobs in the usual fashion. I get the following error:
No route to host
qstat: cannot connect to server sdb (errno=113)

comment:5 Changed 5 years ago by grenville

  • Resolution set to answered
  • Status changed from new to closed

Leighton

The machine is down - please check the ARCHER web site.

Grenville

Note: See TracTickets for help on using tickets.