Opened 4 years ago
Closed 4 years ago
#2056 closed help (answered)
Archiving
Reported by: | simon.tett | Owned by: | um_support |
---|---|---|---|
Component: | Archiving | Keywords: | |
Cc: | Platform: | ARCHER | |
UM Version: | 8.5 |
Description
HI,
is archer archive sick? I'm getting failures from the archive step — or a back log of files to archive.
On Sunday I had failures to archive files as the file did not exist. But when I looked the file did exist. I also got errors like:
bash: /work/n02/n02/stett2/xnbdc/convert_output/xnbdca.pm1955oct.ff_arch_3681.sh: Cannot send after transport endpoint shutdown.
Simon
Change History (8)
comment:1 Changed 4 years ago by grenville
comment:2 Changed 4 years ago by simon.tett
Hi Grenville,
is it sick now…. I've got archive failures with no output…
Simon
comment:3 Changed 4 years ago by grenville
Simon
As far as I know it is sick now - has been for several days (maybe even as far back as last Wed.)
Have you lost data?
Grenville
comment:4 Changed 4 years ago by simon.tett
Hi Grenville,
great… I don't think I've lost data. But hard to tell unless I do some digging or runs have finished. (and the /nerc file system is running very slowly at the moment.) (All was there that I expected on Sunday.)
Simon
comment:5 Changed 4 years ago by simon.tett
Hi Grenville,
I don't think there is much point submitting jobs at the moment as all I am getting is archive failures.. Can you let me know when /nerc is working again…
thanks
Simon
comment:6 Changed 4 years ago by grenville
Simon
Did you get this - if not please sign up for ARCHER messages through SAFE
From: Archer Administration [helpdesk@…]
Sent: 17 January 2017 16:50
Subject: RDF being taken offline at 16:30 17/01/17
Dear RDF Users,
We are currently experiencing problems with some of the RDF hardware
which is affecting the login/pp nodes. In order to resolve this, the RDF file systems will need to be taken offline in at 16:50. This will affect the login, pps, DTNS and DAC nodes.
We apologise for the inconvenience caused and will bring the service back up as soon as possible.
Best regards
The ARCHER Helpdesk Team
support@…
comment:7 Changed 4 years ago by simon.tett
Hi Grenville,
I see it is now on the service status info. I'd rather not have any more emails in my inbox.. ARCHER team seem a bit bad at putting info on the service page! I'll look at that tomorrow.
Simon
comment:8 Changed 4 years ago by ros
- Resolution set to answered
- Status changed from new to closed
Simon
Yes, its sick, we await a formal notification from ARCHER, but they are having Lustre problems - here's how they have responded to our queries:
This is due to CRAY Lustre bug 846445 , which is under critical investigation.
Seagate and CRAY are actively looking for a solution to these frequent
"Cannot send after transport endpoint shutdown" errors.
Grenville