#1662 closed help (fixed)

Problem of submission and compilation of jobs to/on ARCHER Sat 19 & Sun 20 Sept

Reported by: gmann Owned by: gmann
Priority: normal Component: Other
Keywords: ukca Cc:
Platform: ARCHER UM Version: 8.4

Description

Dear NCAS-CMS helpdesk,

Yesterday and today I have been encountering problems whereby jobs submitted to ARCHER are either not compiling within the timelimit or else are not submitting at all.

At first (early hours Saturday morning) I was encountering a problem with UM jobs failing to compile within the timelimit:

/home/n02/n02/gmann/output/xlrfi000.xlrfi.d15262.t023748.comp.leave
/home/n02/n02/gmann/output/xlrfj000.xlrfj.d15262.t025043.comp.leave
/home/n02/n02/gmann/output/xlrfk000.xlrfk.d15262.t024834.comp.leave
/home/n02/n02/gmann/output/xlrfl000.xlrfl.d15262.t024540.comp.leave

I tried submitting them again ~8am Saturday morning but the jobs encountered then a fcm.lock that I hadn't deleted.

Then when I tried later on Saturday the jobs would not submit at alL (in fact ARCHER was unreachable for some of yesterday).

And the same problem with the jobs failing to submit is happening again

The same job (essentially) had been compiling fine on Thursday and Friday.

Has there been some problem over the weekend with ARCHER, PUMA or the connect between then?

Thanks for your help,

Cheers
Graham

Attachments (3)

ticket4.jpg (126.4 KB) - added by scottyiu 19 months ago.
Submit getting stuck
ticket6.jpg (76.0 KB) - added by scottyiu 19 months ago.
Submit timedout
ticket8.jpg (87.0 KB) - added by scottyiu 19 months ago.
Another error

Download all attachments as: .zip

Change History (8)

comment:1 Changed 19 months ago by grenville

Graham

There has been no announcement of problems at ARCHER over the weekend ad far as I know. PUMA has not been problematic.

I did experience poor login node performance yesterday (as did my colleagues) - that could be network related - but today things appear to be back to normal.

I have forwarded your question to ARCHER.

Grenville

comment:2 Changed 19 months ago by scottyiu

Dear Grenville and Graham

I have problems where the submission (via UMUI from PUMA to ARCHER) is significantly slower than normal too. Not sure if this is related to this issue?

I also have some submits that are not submitting at all and is getting stuck.

Thank you.

Best regards,
Scott

Last edited 19 months ago by scottyiu (previous) (diff)

Changed 19 months ago by scottyiu

Submit getting stuck

Changed 19 months ago by scottyiu

Submit timedout

Changed 19 months ago by scottyiu

Another error

comment:3 Changed 19 months ago by ros

  • Component changed from UM Model to Other
  • Status changed from new to pending

Hi Graham,

There is currently a network issue between PUMA and Archer at the moment. This is affecting connections from PUMA to ARCHER, including submissions of UM jobs which may be very slow or fail.

Andy, ARCHER & Reading University IT are looking into this and we will send an email to the PUMA mailing list when we have an update or the situation is resolved.

Regards,
Ros.

comment:4 Changed 19 months ago by ros

The JANET engineers have identified a network issue and put a fix in place. Please let us know if you see any further network stalling between PUMA and ARCHER.

Regards,
Ros.

comment:5 Changed 18 months ago by ros

  • Resolution set to fixed
  • Status changed from pending to closed
Note: See TracTickets for help on using tickets.