Opened 10 years ago
Closed 10 years ago
#650 closed help (fixed)
Problem communicating between puma and hector
Reported by: | pclark | Owned by: | ros |
---|---|---|---|
Component: | UM Model | Keywords: | ssh |
Cc: | Platform: | ||
UM Version: | 7.4 |
Description
I don't seem to be able to compile jobs on hector - no code is mirrored to hector - submitting xflva from puma I get a umui window asking for my hector password. With export UMUI_SSH_DEBUG_LEVEL=1 I get:
PathScale PrgEnv loaded xtpe-network-gemini PrgEnv-pathscale/3.1.49A xt-mpt/5.1.4 pathscale/3.2.99 xtpe-mc12 Your job directory on host phase2b.hector.ac.uk is: /home/n02/n02/paclark/umui_runs/xflva-193141122 spawn ssh -o LogLevel=ERROR -l paclark phase2b.hector.ac.uk test ! -d umui_runs && mkdir umui_runs Password: spawn scp -q -o LogLevel=ERROR -r /home/pclark/umui_jobs/xflva paclark@phase2b.hector.ac.uk:umui_runs/xflva-193141657 Password: spawn ssh -o LogLevel=ERROR -l paclark phase2b.hector.ac.uk mv umui_runs/xflva-193141657/SUBMIT.tmp umui_runs/xflva-193141657/SUBMIT Password: spawn ssh -o LogLevel=ERROR -l paclark phase2b.hector.ac.uk chmod 755 umui_runs/xflva-193141657/SUBMIT Password: spawn ssh -o LogLevel=ERROR -l paclark phase2b.hector.ac.uk umui_runs/xflva-193141657/SUBMIT Password: PathScale PrgEnv loaded xtpe-network-gemini PrgEnv-pathscale/3.1.49A xt-mpt/5.1.4 pathscale/3.2.99 xtpe-mc12 Your job directory on host phase2b.hector.ac.uk is: /home/n02/n02/paclark/umui_runs/xflva-193141657 Calling FCM_MAIN_SCR - local... (This may take several minutes.) FCM_MAIN: Calling Extract ... Creating directory /home/pclark/um/um_extracts/xflva/umbase Creating directory /home/pclark/um/um_extracts/xflva/ummodel Creating directory /home/pclark/um/um_extracts/xflva/umrecon Base extract: OK Model extract: OK Reconfiguration extract: OK FCM_MAIN: Extract OK FCM_MAIN: Submitting stage_1_submit ... Password: 290257.sdb FCM_MAIN: Submit OK PathScale PrgEnv loaded xtpe-network-gemini PrgEnv-pathscale/3.1.49A xt-mpt/5.1.4 pathscale/3.2.99 xtpe-mc12 Your job directory on host phase2b.hector.ac.uk is: /home/n02/n02/paclark/umui_runs/xflva-193141657
Most of the 'Password' lines go through without requiring input, but the last just hangs till I put in my password at the command line. Then all appears OK. However, the job fails on hector with:
Unable to read config file "/work/n02/n02/paclark/um/xflva/umbase/cfg/bld.cfg", abort at /work/n02/n02/hum/fcm/bin/../lib/Fcm/ConfigSystem.pm line 528 Build command started on Tue Jul 12 14:20:21 2011. ->Parse configuration: start Base build: failed
Looking at the ext.out files on puma, I have lots of lines like:
->Mirror: start Destination: paclark@phase2b.hector.ac.uk:/work/n02/n02/paclark/um/xflva/umbase # Start: 2011-07-12 14:17:52=> ssh -n -oBatchMode=yes paclark@phase2b.hector.ac.uk mkdir -p /work/n02/n02/paclark/um/xflva/umbase/cfg Permission denied (publickey,keyboard-interactive).^M EOF received # Start: 2011-07-12 14:17:52=> rsync -a '--exclude=.*' --delete-excluded --timeout=900 '--rsh=ssh -oBatchMode=yes' -v /home/pclark/um/um_extracts/xflva/umbase /cfg/bld.cfg paclark@phase2b.hector.ac.uk:/work/n02/n02/paclark/um/xflva/umbase/cfg Permission denied (publickey,keyboard-interactive).^M rsync: connection unexpectedly closed (0 bytes received so far) [sender] rsync error: unexplained error (code 255) at io.c(632) [sender=3.0.4] EOF received
I therefore presume that rsync is having problems with ssh.
Please note I have turned off the ssh-agent for use with MONSoon - it didn't make any appreciable difference but PUMA documentation doesn't suggest this is need with hector.
Change History (6)
comment:1 Changed 10 years ago by ros
comment:2 Changed 10 years ago by pclark
Grrr. Thanks - I was looking at the FAQs on the NCAS web pages (and into to PUMA) - forgot the Wiki.
As I said, it wasn't working even with this on, which is why I disabled it. I guess it got corrupted somehow. However, setting up from scratch seems to have cured the problem. Thanks
Peter
comment:3 Changed 10 years ago by ros
- Owner changed from um_support to ros
- Status changed from new to assigned
From the ssh set up link from the "Running the UM on HECToR" page.
http://cms.ncas.ac.uk/index.php/um-documentation/running-the-um-on-hector I've added a section that points to the ssh-agent instructions.
Were these the pages you had been looking at?
I do plan to reorganise some of the UM FAQ pages soon to try and make it easier to find information.
comment:4 Changed 10 years ago by pclark
Yes but also http://cms.ncas.ac.uk/index.php/puma/737?task=view, which gives the impression that password-less submission is now a feature of the umui. (I had read the ssh agent instructions, once upon a time, but forgot about them. Your added section is very helpful.
Thanks
Peter
comment:5 Changed 10 years ago by pclark
Just for the record, this can be closed. Many thanks.
comment:6 Changed 10 years ago by ros
- Resolution set to fixed
- Status changed from assigned to closed
Links to ssh setup for HECToR and MONSooN have been added to the PUMA page above.
This ticket is now being closed.
Hi Pete,
Yes you do need to have ssh-agent set up in order to submit UM jobs (UM versions using FCM) to HECToR.
See http://puma.nerc.ac.uk/trac/UM/wiki/SettingUpYourEnvironment
I'll look at making the documentation a bit clearer.
Regards,
Ros.