umsetvars related error

I'm having difficulty reconfiguring an existing job (xlkja) that I've copied (from tdwpm).

The error in the .rcf.leave file suggests that files expected in the /tmp/ directory cannot be located. I've checked the compile file and seen that the /tmp/ folder is created remotely (using umsetvars from my PUMA .profile script I think).

My PUMA .profile is near-identical to an experienced ARCHER user (Graham Mann).

The name of the files in my folder:
do not match what is expected by the reconfiguration job:

I can't see anything similar in a search of existing tickets.



Change History (5)

comment:1 Changed 6 years ago by grenville


The reconfiguration worked OK - it created a start file /work/n02/n02/lre/um/restart/xlazia.da20071201_00 which looks OK and the leave file says

End of rcf program reached. PE 0,

which is a good sign.

I can't see why the model didn't start - what happens if you

cd /home/n02/n02/lre/umui_runs/xlkja-126154513
qsub umuisubmit_run



comment:2 Changed 6 years ago by Leighton_Regayre

Hi Grenville,

The job reports a qsserver error.

In the .leave file:
/work/n02/n02/lre/um/xlkja/bin/qsserver[196]: : cannot open
/work/n02/n02/lre/um/xlkja/bin/qsserver[197]: : cannot open
/work/n02/n02/lre/um/xlkja/bin/qsserver[198]: : cannot open


comment:3 Changed 6 years ago by grenville


Masaru has seen these issues (and is grappling with them) - can you contact him?


comment:4 Changed 6 years ago by Leighton_Regayre


I've attempted to follow the advice Masaru gave in resetting my keys:

Set up ssh keys
To allow the UM to archive data to the /nerc disk on Archer the following commands need to be done once on Archer
mkdir -p ~/.ssh
cd ~/.ssh
ssh-keygen -f um_arch
cat >> authorized_keys

As passphraseless access is required, in response to the request for a passphrase from ssh-keygen hit enter twice to generate an empty passphrase.
then redo the
ssh -v -i $HOME/.ssh/um_arch espp1

I can now no longer see the queue or submit jobs in the usual fashion. I get the following error:
No route to host
qstat: cannot connect to server sdb (errno=113)

comment:5 Changed 6 years ago by grenville

  • Resolution set to answered
  • Status changed from new to closed


The machine is down - please check the ARCHER web site.


