Opened 2 years ago

Closed 2 years ago

#2056 closed help (answered)

Archiving

Reported by: simon.tett Owned by: um_support
Component: Archiving Keywords:
Cc: Platform: ARCHER
UM Version: 8.5

Description

HI,

is archer archive sick? I'm getting failures from the archive step — or a back log of files to archive.

On Sunday I had failures to archive files as the file did not exist. But when I looked the file did exist. I also got errors like:
bash: /work/n02/n02/stett2/xnbdc/convert_output/xnbdca.pm1955oct.ff_arch_3681.sh: Cannot send after transport endpoint shutdown.

Simon

Change History (8)

comment:1 Changed 2 years ago by grenville

Simon

Yes, its sick, we await a formal notification from ARCHER, but they are having Lustre problems - here's how they have responded to our queries:

This is due to CRAY Lustre bug 846445 , which is under critical investigation.

Seagate and CRAY are actively looking for a solution to these frequent
"Cannot send after transport endpoint shutdown" errors.

Grenville

comment:2 Changed 2 years ago by simon.tett

Hi Grenville,

is it sick now…. I've got archive failures with no output…

Simon

comment:3 Changed 2 years ago by grenville

Simon

As far as I know it is sick now - has been for several days (maybe even as far back as last Wed.)

Have you lost data?

Grenville

comment:4 Changed 2 years ago by simon.tett

Hi Grenville,

great… I don't think I've lost data. But hard to tell unless I do some digging or runs have finished. (and the /nerc file system is running very slowly at the moment.) (All was there that I expected on Sunday.)

Simon

comment:5 Changed 2 years ago by simon.tett

Hi Grenville,

I don't think there is much point submitting jobs at the moment as all I am getting is archive failures.. Can you let me know when /nerc is working again…

thanks
Simon

comment:6 Changed 2 years ago by grenville

Simon

Did you get this - if not please sign up for ARCHER messages through SAFE


From: Archer Administration [helpdesk@…]
Sent: 17 January 2017 16:50
Subject: RDF being taken offline at 16:30 17/01/17

Dear RDF Users,

We are currently experiencing problems with some of the RDF hardware
which is affecting the login/pp nodes. In order to resolve this, the RDF file systems will need to be taken offline in at 16:50. This will affect the login, pps, DTNS and DAC nodes.

We apologise for the inconvenience caused and will bring the service back up as soon as possible.

Best regards

The ARCHER Helpdesk Team
support@…

comment:7 Changed 2 years ago by simon.tett

Hi Grenville,

I see it is now on the service status info. I'd rather not have any more emails in my inbox.. ARCHER team seem a bit bad at putting info on the service page! I'll look at that tomorrow.

Simon

comment:8 Changed 2 years ago by ros

  • Resolution set to answered
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.