Opened 4 weeks ago

Last modified 4 weeks ago

#3153 new help

postproc App setup

Reported by: pmcguire Owned by: um_support
Component: Archiving Keywords: archive, postproc, PPTRANSFER
Cc: Platform: ARCHER
UM Version:

Description

Hi CMS Helpdesk
I have tried to follow the instructions on http://cms.ncas.ac.uk/wiki/Docs/PostProcessingAppArcherSetup
for configuring my Puma/Archer? suite u-bq532m to run the pptransfer step. But it's not working right now. Can you help?

I had previously run suite u-bq532m for one month. But the pptransfer step was not enabled during that run. So I took the postproc commands that required the RUN option out and made a new graph that didn't require the RUN option but that did the pptransfer step. Maybe skipping the RUN step here is making it mess up?

The error that I get in the pptransfer job.err file is:

[ERROR]  Archive directory /nerc/n02/n02/pmcguire/archive/u-bq532m/19880901T0000Z doesn't exist

The directory /nerc/n02/n02/pmcguire/archive does exist on Archer, but /nerc/n02/n02/pmcguire/archive/u-bq532m does not.

Any suggestions for what I could be doing wrong?
Patrick

Change History (14)

comment:1 Changed 4 weeks ago by pmcguire

It looks like there are now partial copies of my /work/n02/n02/pmcguire/cylc-run/u-bq532m in my /nerc/n02/n02/pmcguire/cylc-run/u-bq532m directory. Should I just change the archive_root_path variable to be /nerc/n02/n02/pmcguire/cylc-run from /nerc/n02/n02/pmcguire/archive?
I am trying that now.
Patrick

comment:2 Changed 4 weeks ago by ros

Hi Patrick,

If you've hacked around with the graph this will more likely than not mess things up. For future reference cylc has the ability to be able to insert new tasks into already running cycles.

We can't see files in your /nerc/n02/n02/pmguire directory please change the permissions using chmod -R g+rX <dir>

I also can't see what the postproc task actually did as you have quite a few tar'rd up log directories and I don't know which to look in to get the log file, I suspect there was very little, that was available for archiving after only 1 month.

I strongly recommend you don't set the archive_root-path to be a subdirectory of the cylc-run/<suite-id> directory as this is where all the cylc suite control files go and if you were to do a rose suite-run --new at anytime the data archived under here would be deleted.

Regards,
Ros

comment:3 Changed 4 weeks ago by ros

Just found the postproc log output on ARCHER. There was nothing to archive at the end of the first month. See /home/n02/n02/pmcguire/cylc-run/u-bq532m/log/job/19880901T0000Z/postproc/01/job.out

Last edited 4 weeks ago by ros (previous) (diff)

comment:4 Changed 4 weeks ago by pmcguire

I now have website access.
I have changed the permissions of /nerc/n02/n02/pmcguire
Yes, as the postproc log says, no files were marked for archive after one month. Should I change something in my settings in order to get it to archive with only 1 month of data?
The most recent postproc run I did with this suite was with archive_root_path set to the nerc cylc-run directory. I have since changed it back to the /nerc/n02/n02/pmcguire/archive directory.
Patrick

Last edited 4 weeks ago by pmcguire (previous) (diff)

comment:5 Changed 4 weeks ago by ros

If your run length is only 1 month then you should just need to change ARCHIVE_FINAL to True in the postproc app.

Cheers,
Ros.

comment:6 Changed 4 weeks ago by pmcguire

That's a good tip. Thanks!
Patrick

comment:7 Changed 4 weeks ago by pmcguire

pp files are now being created in the /nerc/n02/n02/pmcguire/archivedirectory!
Patrick

comment:8 Changed 4 weeks ago by pmcguire

The postproc task now successfully archives the pp files.
I am now trying to figure out what's wrong with my pptransfer task. Maybe you are much quicker than me in figuring this out? I will keep trying though.

Here's my error message:

[WARN]  [SUBPROCESS]: Command: rsync -av --stats --rsync-path=mkdir -p /group_workspaces/jasmin2/nexcs/pmcguire/archer_archive/u-bq532m/19880901T0000Z && rsync /nerc/n02/n02/pmcguire/archive/u-bq532m/19880901T0000Z/ jasmin-xfer2.ceda.ac.uk:/group_workspaces/jasmin2/nexcs/pmcguire/archer_archive/u-bq532m/19880901T0000Z
[SUBPROCESS]: Error = 255:
	
            Access to this system is monitored and restricted to
            authorised users.   If you do not have authorisation
            to use  this system,  you should not  proceed beyond
            this point and should disconnect immediately.

            Unauthorised use could lead to prosecution.

    (See also - http://www.stfc.ac.uk/aup)

ssh_askpass: exec(/usr/libexec/openssh/ssh-askpass): No such file or directory
Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.6]

When I try to run that long rsync command at the command line of dtn02, it seems to start working just fine.
Patrick

comment:9 Changed 4 weeks ago by ros

I presume you have ssh-agent setup on dtn02?

It may be that when you try it interactively that you are using an agent that you have forwarded.

Cheers,
Ros.

P.s.
Still can't see into your /nerc/n02/n02/pmcguire directory.

comment:10 Changed 4 weeks ago by pmcguire

Thanks, Ros.
I just re-did the ssh-agent setup, but that didn't help. I still get the same error in pptransfer.

And I currently log in to Puma from my Mac with ssh -Y. It asks for both a passphrase and a password then, despite my attempts to start an ssh-agent on my Mac that can go to Puma.

Once I am on Puma, when I ssh to dtn02 without a passphrase/word, I don't use either -X or -Y, and from dtn02, I can ssh to jasmin-xfer2/xfer1 without a passphrase/word.

I have changed the permissions again on /nerc/n02/n02/pmcguire.
Patrick

comment:11 Changed 4 weeks ago by ros

Can you send the output from running ssh-add -l on dtn02 please?

Ros.

comment:12 Changed 4 weeks ago by ros

Ahh I can see what part of the problem is… the permissions on your .ssh on dtn02 directory are too open which it certainly won't like.

chmod 700 ~/.ssh

and try again.

comment:13 Changed 4 weeks ago by pmcguire

Thanks, Ros, for your sleuthing and for catching that! I really appreciate it.

I just changed the .ssh permissions, and I restarted the agent, and now ssh-add -l gives something appropriate:

2048 99:ce:08:26:33:20:3e:da:ec:17:c8:96:38:ff:fc:bd /nerc/n02/n02/pmcguire/.ssh/id_rsa_jasmin (RSA)

unlike before.

I have restarted the suite, and maybe we'll find out in a few minutes if it's working or not. I expect that it will be.

I have been trying to figure out in my spare time today if in a normal run that archive_final should be set to true or not.
Should it be false in a normal run? Is there documentation about archive_final somewhere?
Patrick

comment:14 Changed 4 weeks ago by pmcguire

Hi Ros
It still didn't work.

I subsequently figured out that after I started an agent and after I logged out of dtn02, when I logged back in to dtn02, the agent no longer was working properly. I think I tracked this down to my having both a .profile file and a .bash_profile file on dtn02. I am not sure if the .profile file is getting executed, but the .bash_profile file is getting executed. So I moved the ~/.ssh/ssh-setup line from the .profile file to the .bash_profile file, and renamed the .profile file to .profile_old, so that .profile is not executed.

It looks like it started the mkdir & rsync command properly on jasmin-xfer2 this time. The directories have been created there, and files are getting transferred.

Thanks for your help! I still am curious about the archive_final query in my previous comment.
Patrick

Note: See TracTickets for help on using tickets.