Opened 8 years ago
Closed 8 years ago
#990 closed help (fixed)
connection to monsoon through umui fails
Reported by: | momm | Owned by: | ros |
---|---|---|---|
Component: | MONSooN | Keywords: | lander time out |
Cc: | Platform: | MONSooN | |
UM Version: |
Description
Since a couple of hours my jobsubmissions to monsoon through the umui on puma fail due to connection time out, while standard logins from puma or other machines work fine.
Error message:
Initialising SUBMIT…
Writing remote commands file…
Calling NDS_MAIN_SCR - local…
(This may take several minutes.)
NDS_MAIN: Calling Extract …
Extracting BASE …
BASE extract is OK
Extracting NEMO_CICE …
NEMO_CICE extract is OK
NDS_MAIN: Extract OK
NDS_MAIN: Submit OK
Logging in to remote machines lander.monsoon-metoffice.co.uk and ibm02…
REMCOMMS 100% 4134 4.0KB/s 00:00
Creating directory…
Copying job files…
Renaming SUBMIT…
Changing SUBMIT permissions…
Running SUBMIT script…
Directory /scratch/localtemp/mbuten.3867106 created
Your job directory on host ibm02 is: /home/mbuten/umui_runs/xhooc-341192853
Total PEs : 128
NOTE: You are requesting the use of 4 node(s) on the IBM
Copying files to directory /projects/imarnet/mbuten/UM/xhooc/umbase using rsync…
See /projects/imarnet/mbuten/UM/xhooc/umbase/ext.out for output
Timed out, lander.monsoon-metoffice.co.uk not responding
Tidying local directories…
Job submission failed
ERROR: Timed out, lander.monsoon-metoffice.co.uk not responding while attempting to access account mbuten on host ibm02. Note that repeated failures may result in expiry of password due to security procedures on some machines. Check user id, hostname and password for your account on the host machine.
Attachments (1)
Change History (4)
Changed 8 years ago by momm
comment:1 Changed 8 years ago by momm
This actually started working again this morning, now it fails again. And I realised that it goes along with very slow rsync connections. Does that mean we can't submit jobs in non-standard office hours?
Puzzled…
comment:2 Changed 8 years ago by ros
- Keywords lander time out added
- Owner changed from um_support to ros
- Platform changed from <select platform> to MONSooN
- Status changed from new to accepted
- UM Version <select version> deleted
Hi
This is a problem that we and the MONSooN team at the Met Office are aware of and we are working to track down the problem. Unfortunately it is not proving easy to identify as it is not predictable.
I can only currently suggest keep trying to submit. Sorry for any inconvenience caused we will get this fixed as soon as possible.
Regards,
Ros.
comment:3 Changed 8 years ago by ros
- Resolution set to fixed
- Status changed from accepted to closed
Hi Momme,
The Met Office put a fix in a couple of weeks ago which we believe has fixed the time out problems with the MONSooN HPC/lander. If you continue to experience this problem, please do let us know. I will close this ticket now, however, you can re-open it should the problem be found to persist.
Regards,
Ros.
ext.out