Opened 5 years ago

Closed 5 years ago

#1480 closed help (answered)

Problems with extract failures

Reported by: pclark Owned by: ros
Component: FCM Keywords:
Cc: Platform: PUMA
UM Version: 8.6

Description

I seem to have intermittent, unpredictable problems with the fcm extract step. For example, I am resubmitting a job because it was held on ARCHER (due to the budget running out). It extracted before, no problem. Now I get (without any changes to teh job or branches etc.):

MAIN_SCR: Calling Extract ...
Extracting UMATMOS base repository...
UMATMOS base repository extract is OK
Extracting JULES base repository...
JULES base repository extract failed
See extract output file /home/pclark/um/um_extracts/xkjba/baserepos/JULES/ext.out
MAIN_SCR: Extract failed
MAIN_SCR stopped with return code 255

The extract output pointed to says:

[FAIL] ssh -n -oBatchMode=yes paclark@login.archer.ac.uk mkdir -p /home/n02/n02/paclark/um/xkjba/baserepos/JULES failed (255) at /home/fcm/fcm-2014.12.0/bin/../lib/FCM1/Dest.pm line 755.

This directory exists, of course, due to the previous extract, but that shouldn't be a problem, surely?

Change History (3)

comment:1 Changed 5 years ago by ros

Hi Peter,

This is an intermittent problem we are aware of and are trying to identify, but due to its intermittent nature it is proving very hard to track down. We are having some network issues in the department so it may be due to this. Usually resubmitting the job will go through ok.

Regards,
Ros.

comment:2 Changed 5 years ago by scottyiu

Dear Ros

Just to let you know that I have something similar as well. Resubmitting a couple of times usually solves the problem though.

Thanks!

Scott

comment:3 Changed 5 years ago by annette

  • Resolution set to answered
  • Status changed from new to closed

This is an intermittent but ongoing issue which we are investigating. The only work-around we have at the moment is just to resubmit. Occasionally the extract will fail repeatedly at the same point, in which case running the ssh command on the command-line seems to fix it.

Best regards,
Annette

Note: See TracTickets for help on using tickets.