Opened 7 years ago

Closed 7 years ago

#990 closed help (fixed)

connection to monsoon through umui fails

Reported by: momm Owned by: ros
Component: MONSooN Keywords: lander time out
Cc: Platform: MONSooN
UM Version:

Description

Since a couple of hours my jobsubmissions to monsoon through the umui on puma fail due to connection time out, while standard logins from puma or other machines work fine.
Error message:
Initialising SUBMIT…
Writing remote commands file…
Calling NDS_MAIN_SCR - local…
(This may take several minutes.)

NDS_MAIN: Calling Extract …
Extracting BASE …
BASE extract is OK
Extracting NEMO_CICE …
NEMO_CICE extract is OK
NDS_MAIN: Extract OK
NDS_MAIN: Submit OK
Logging in to remote machines lander.monsoon-metoffice.co.uk and ibm02…

REMCOMMS 100% 4134 4.0KB/s 00:00
Creating directory…
Copying job files…
Renaming SUBMIT…
Changing SUBMIT permissions…
Running SUBMIT script…
Directory /scratch/localtemp/mbuten.3867106 created

Your job directory on host ibm02 is: /home/mbuten/umui_runs/xhooc-341192853

Total PEs : 128
NOTE: You are requesting the use of 4 node(s) on the IBM

Copying files to directory /projects/imarnet/mbuten/UM/xhooc/umbase using rsync…
See /projects/imarnet/mbuten/UM/xhooc/umbase/ext.out for output
Timed out, lander.monsoon-metoffice.co.uk not responding

Tidying local directories…
Job submission failed


ERROR: Timed out, lander.monsoon-metoffice.co.uk not responding while attempting to access account mbuten on host ibm02. Note that repeated failures may result in expiry of password due to security procedures on some machines. Check user id, hostname and password for your account on the host machine.

Attachments (1)

ext.out (941 bytes) - added by momm 7 years ago.
ext.out

Download all attachments as: .zip

Change History (4)

Changed 7 years ago by momm

ext.out

comment:1 Changed 7 years ago by momm

This actually started working again this morning, now it fails again. And I realised that it goes along with very slow rsync connections. Does that mean we can't submit jobs in non-standard office hours?

Puzzled…

comment:2 Changed 7 years ago by ros

  • Keywords lander time out added
  • Owner changed from um_support to ros
  • Platform changed from <select platform> to MONSooN
  • Status changed from new to accepted
  • UM Version <select version> deleted

Hi

This is a problem that we and the MONSooN team at the Met Office are aware of and we are working to track down the problem. Unfortunately it is not proving easy to identify as it is not predictable.

I can only currently suggest keep trying to submit. Sorry for any inconvenience caused we will get this fixed as soon as possible.

Regards,
Ros.

comment:3 Changed 7 years ago by ros

  • Resolution set to fixed
  • Status changed from accepted to closed

Hi Momme,

The Met Office put a fix in a couple of weeks ago which we believe has fixed the time out problems with the MONSooN HPC/lander. If you continue to experience this problem, please do let us know. I will close this ticket now, however, you can re-open it should the problem be found to persist.

Regards,
Ros.

Note: See TracTickets for help on using tickets.