Opened 7 years ago

Closed 7 years ago

Last modified 7 years ago

#999 closed error (fixed)

MONSooN lander timeout during job submission

Reported by: mk1812 Owned by: ros
Component: MONSooN Keywords:
Cc: Platform: MONSooN
UM Version: 8.2

Description

Hi,

Whenever I try and submit a job from PUMA, I'm getting a timeout error. The job submission gets as far as the stage: "Copying files to directory /projects/ukca/mkasoa/xhwta/baserepos/UMATMOS using rsync…
See /projects/ukca/mkasoa/xhwta/baserepos/UMATMOS/ext.out for output", and then I get an error saying: "ERROR: Timed out, lander.monsoon-metoffice.co.uk not responding while attempting to access account mkasoa on host ibm02." This happens every time I try and submit a job. Some files have been successfully copied to the relevant folders in my home and projects directories on ibm02, but it never gets further than this stage. If I look in the ext.out file that is created in /projects/ukca/mkasoa/xhwta/baserepos/UMATMOS/, the final line reads: "rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(549) [Receiver=3.0.9]".

Any advice as to what the problem might be and how I could resolve it would be greatly appreciated.

Best regards,
Matthew Kasoar
Imperial College London

Change History (5)

comment:1 Changed 7 years ago by ros

  • Owner changed from um_support to ros
  • Status changed from new to accepted

Hi Matthew,

We are aware of an intermittent issue with connections out from MONSooN HPC and it has been passed to the MONSooN team at the Met Office who are working to diagnose and solve the problem. I can only suggest keep trying to submit the job.

Sorry for any inconvenience.

Regards,
Ros.

comment:2 Changed 7 years ago by mk1812

Ok, thanks.
Matt

comment:3 Changed 7 years ago by ros

Hi Matt,

The Met Office put a fix in a couple of weeks ago which we believe has fixed the time out problems with the MONSooN HPC/lander. If you continue to experience this problem, please do let us know. I will close this ticket now, however, you can re-open it should the problem be found to persist.

Regards,
Ros.

comment:4 Changed 7 years ago by ros

  • Resolution set to fixed
  • Status changed from accepted to closed

comment:5 Changed 7 years ago by mk1812

Ok great. My jobs do seem to have been submitting much more reliably recently. Thanks.

Matt

Note: See TracTickets for help on using tickets.