Opened 5 months ago

Closed 3 weeks ago

#2936 closed help (fixed)

Error transferring to JASMIN

Reported by: pringle Owned by: um_support
Component: Rose/Cylc Keywords:
Cc: Platform:
UM Version: 11.1

Description

Hi,

I am working with Rose for the first time, I have a suite that runs for a colleague, but for me it fails in the pptransfer stage. I assume that I have a path or permission incorrect, but I can't see what. Could you possibly take a look?

I have setup my ssh keys following the instructions and can ssh from puma to ARCHER and the DTN without needing a password. And also from DTN to JASMIN.

The job is: u-bi943

The error is:

[WARN] file:atmospp.nl: skip missing optional source: namelist:moose_arch
[WARN] file:atmospp.nl: skip missing optional source: namelist:script_arch
[WARN] [SUBPROCESS]: Command: globus-url-copy -vb -cd -cc 4 -sync file:///nerc/n02/n02/pringle/u-bi943/20080101T0000Z/ sshftp://jasmin-xfer1.ceda.ac.uk/gws/nopw/j04/gassp/kpringle/u-bi943/20080101T0000Z/
[SUBPROCESS]: Error = 1:

error: Unable to check destination url for sync: sshftp://jasmin-xfer1.ceda.ac.uk/gws/nopw/j04/gassp/kpringle/u-bi943/20080101T0000Z/
an end-of-file was reached
globus_xio: An end of file occurred

[WARN] Transfer command failed: globus-url-copy -vb -cd -cc 4 -sync file:///nerc/n02/n02/pringle/u-bi943/20080101T0000Z/ sshftp://jasmin-xfer1.ceda.ac.uk/gws/nopw/j04/gassp/kpringle/u-bi943/20080101T0000Z/
[ERROR] transfer.py: Unknown Error - Return Code=1
[FAIL] Command Terminated
[FAIL] Terminating PostProc?
[FAIL] transfer.py # return-code=1
Received signal ERR
cylc (scheduler - 2019-06-13T10:28:46Z): CRITICAL Task job script received signal ERR at 2019-06-13T10:28:46Z
cylc (scheduler - 2019-06-13T10:28:46Z): CRITICAL failed at 2019-06-13T10:28:46Z

Many thanks,
Kirsty

Change History (14)

comment:1 Changed 5 months ago by ros

Hi Kirsty,

First off have you set up ssh-agent so that you can ssh from ARCHER DTN to JASMIN?

If not please follow the instructions here: http://cms.ncas.ac.uk/wiki/Docs/PostProcessingAppArcherSetup#sshdtntojasmin

Cheers,
Ros.

comment:2 Changed 5 months ago by pringle

Hi Ros,

I just checked, I have done all that and I can ssh from archer RDF to JASMIN without a password prompt, so I think it's all working.

I wondered in the destination path was incorrect somehow, but I thought it looked correct?

Thanks,
Kirsty

comment:3 Changed 5 months ago by grenville

Kirsty

Are you a member of another gws - if so, can you try that?

Grenville

comment:4 Changed 5 months ago by grenville

Kirsty

Leighton is sending data to gassp (I think) - is he having the same problem?

Grenville

comment:5 Changed 5 months ago by grenville

Kirsty

Is there any space in gassp?

comment:6 Changed 4 months ago by ros

Hi Kirsty,

Is this still a problem?

Cheers,
Ros.

comment:7 Changed 4 months ago by pringle

Hi Ros,

Yes, sorry. It's still not working. There is definitely some space in gassp so I don't think that's the issue. Leighton is also sending data to gassp and has the same issue.

Thanks,
Kirsty

comment:8 Changed 4 months ago by ros

Hi Kirsty,

Were you able to try Grenville's suggestion of transferring to another GWS so we can determine if it is specific to the gassp disk or something else?

Cheers,
Ros.

comment:9 Changed 4 months ago by ros

P.S.

Could you also please try running our connections test suite u-al624 just to confirm that your environment is setup correctly? All you need to do on puma is:

$ rosie checkout u-al624
$ rose suite-run

comment:10 Changed 4 months ago by pringle

Thanks Ros,

I just tried to run the u-al624 and it failed, so I guess it must be my environment?

I just re-checked and I can ssh without password from puma to:

ssh pringle@…
ssh -Y pringle@…

Can you see why the job failed?

Thanks,
Kirsty

comment:11 Changed 4 months ago by ros

Hi Kirsty,

Your connection from puma → dtn02 is working fine. It's then failed to ssh from dtn02 to jasmin-xfer1.ceda.ac.uk. Can you ssh from dtn02 to jasmin-xfer1.ceda.ac.uk without password/passphrase on the command line?

I can see that you have 2 ssh-agent processes running on dtn02 which could cause a problem. I would suggest killing both of those processes. Then log out of dtn02 and back in again to restart the agent, do ssh-add <jasmin-key> to re-add your key to the agent and then try again.

Regards,
Ros.

comment:12 Changed 4 months ago by pringle

Hi Ros,

Many thanks. I've killed the two clients and re-started. And followed instructions here:

http://cms.ncas.ac.uk/wiki/Docs/PostProcessingAppArcherSetup

I can now ssh from dtn to xfer1 and xfer2 and the test job you gave me runs.

I will re-try the submission to GASSP, but I'm away for a few weeks after today so it will be a while.

Many thanks for your help,
Kirsty

comment:13 Changed 7 weeks ago by grenville

Hi Kirsty

I'll close this - open a new ticket if needed.

Grenville

comment:14 Changed 3 weeks ago by willie

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.