Opened 9 years ago

Closed 9 years ago

#650 closed help (fixed)

Problem communicating between puma and hector

Reported by: pclark Owned by: ros
Component: UM Model Keywords: ssh
Cc: Platform:
UM Version: 7.4

Description

I don't seem to be able to compile jobs on hector - no code is mirrored to hector - submitting xflva from puma I get a umui window asking for my hector password. With export UMUI_SSH_DEBUG_LEVEL=1 I get:

PathScale PrgEnv loaded
xtpe-network-gemini
PrgEnv-pathscale/3.1.49A
xt-mpt/5.1.4
pathscale/3.2.99
xtpe-mc12

Your job directory on host phase2b.hector.ac.uk is:
  /home/n02/n02/paclark/umui_runs/xflva-193141122


spawn ssh -o LogLevel=ERROR -l paclark phase2b.hector.ac.uk test ! -d umui_runs && mkdir umui_runs
Password: 
spawn scp -q -o LogLevel=ERROR -r /home/pclark/umui_jobs/xflva paclark@phase2b.hector.ac.uk:umui_runs/xflva-193141657
Password: 
spawn ssh -o LogLevel=ERROR -l paclark phase2b.hector.ac.uk mv umui_runs/xflva-193141657/SUBMIT.tmp umui_runs/xflva-193141657/SUBMIT
Password: 
spawn ssh -o LogLevel=ERROR -l paclark phase2b.hector.ac.uk chmod 755 umui_runs/xflva-193141657/SUBMIT
Password: 
spawn ssh -o LogLevel=ERROR -l paclark phase2b.hector.ac.uk umui_runs/xflva-193141657/SUBMIT
Password: 
PathScale PrgEnv loaded
xtpe-network-gemini
PrgEnv-pathscale/3.1.49A
xt-mpt/5.1.4
pathscale/3.2.99
xtpe-mc12

Your job directory on host phase2b.hector.ac.uk is:
  /home/n02/n02/paclark/umui_runs/xflva-193141657
Calling FCM_MAIN_SCR - local...
(This may take several minutes.)

FCM_MAIN: Calling Extract ...
Creating directory /home/pclark/um/um_extracts/xflva/umbase
Creating directory /home/pclark/um/um_extracts/xflva/ummodel
Creating directory /home/pclark/um/um_extracts/xflva/umrecon
Base extract: OK
Model extract: OK
Reconfiguration extract: OK
FCM_MAIN: Extract OK

FCM_MAIN: Submitting stage_1_submit ...
Password: 
290257.sdb
FCM_MAIN: Submit OK

PathScale PrgEnv loaded
xtpe-network-gemini
PrgEnv-pathscale/3.1.49A
xt-mpt/5.1.4
pathscale/3.2.99
xtpe-mc12

Your job directory on host phase2b.hector.ac.uk is:
  /home/n02/n02/paclark/umui_runs/xflva-193141657

Most of the 'Password' lines go through without requiring input, but the last just hangs till I put in my password at the command line. Then all appears OK. However, the job fails on hector with:

Unable to read config file "/work/n02/n02/paclark/um/xflva/umbase/cfg/bld.cfg", abort at /work/n02/n02/hum/fcm/bin/../lib/Fcm/ConfigSystem.pm line 528
Build command started on Tue Jul 12 14:20:21 2011.
->Parse configuration: start
Base build: failed

Looking at the ext.out files on puma, I have lots of lines like:

->Mirror: start
Destination: paclark@phase2b.hector.ac.uk:/work/n02/n02/paclark/um/xflva/umbase
# Start: 2011-07-12 14:17:52=> ssh -n -oBatchMode=yes paclark@phase2b.hector.ac.uk mkdir -p /work/n02/n02/paclark/um/xflva/umbase/cfg
Permission denied (publickey,keyboard-interactive).^M
EOF received
# Start: 2011-07-12 14:17:52=> rsync -a '--exclude=.*' --delete-excluded --timeout=900 '--rsh=ssh -oBatchMode=yes' -v /home/pclark/um/um_extracts/xflva/umbase
/cfg/bld.cfg paclark@phase2b.hector.ac.uk:/work/n02/n02/paclark/um/xflva/umbase/cfg
Permission denied (publickey,keyboard-interactive).^M
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: unexplained error (code 255) at io.c(632) [sender=3.0.4]
EOF received

I therefore presume that rsync is having problems with ssh.
Please note I have turned off the ssh-agent for use with MONSoon - it didn't make any appreciable difference but PUMA documentation doesn't suggest this is need with hector.

Change History (6)

comment:1 Changed 9 years ago by ros

Hi Pete,

Yes you do need to have ssh-agent set up in order to submit UM jobs (UM versions using FCM) to HECToR.
See http://puma.nerc.ac.uk/trac/UM/wiki/SettingUpYourEnvironment

I'll look at making the documentation a bit clearer.

Regards,
Ros.

comment:2 Changed 9 years ago by pclark

Grrr. Thanks - I was looking at the FAQs on the NCAS web pages (and into to PUMA) - forgot the Wiki.

As I said, it wasn't working even with this on, which is why I disabled it. I guess it got corrupted somehow. However, setting up from scratch seems to have cured the problem. Thanks

Peter

comment:3 Changed 9 years ago by ros

  • Owner changed from um_support to ros
  • Status changed from new to assigned

From the ssh set up link from the "Running the UM on HECToR" page.
http://cms.ncas.ac.uk/index.php/um-documentation/running-the-um-on-hector I've added a section that points to the ssh-agent instructions.

Were these the pages you had been looking at?

I do plan to reorganise some of the UM FAQ pages soon to try and make it easier to find information.

comment:4 Changed 9 years ago by pclark

Yes but also http://cms.ncas.ac.uk/index.php/puma/737?task=view, which gives the impression that password-less submission is now a feature of the umui. (I had read the ssh agent instructions, once upon a time, but forgot about them. Your added section is very helpful.

Thanks

Peter

comment:5 Changed 9 years ago by pclark

Just for the record, this can be closed. Many thanks.

comment:6 Changed 9 years ago by ros

  • Resolution set to fixed
  • Status changed from assigned to closed

Links to ssh setup for HECToR and MONSooN have been added to the PUMA page above.

This ticket is now being closed.

Note: See TracTickets for help on using tickets.