Opened 13 months ago
Closed 8 months ago
#3153 closed help (fixed)
postproc App setup
Reported by: | pmcguire | Owned by: | um_support |
---|---|---|---|
Component: | Archiving | Keywords: | archive, postproc, PPTRANSFER |
Cc: | Platform: | ARCHER | |
UM Version: |
Description
Hi CMS Helpdesk
I have tried to follow the instructions on http://cms.ncas.ac.uk/wiki/Docs/PostProcessingAppArcherSetup
for configuring my Puma/Archer? suite u-bq532m to run the pptransfer step. But it's not working right now. Can you help?
I had previously run suite u-bq532m for one month. But the pptransfer step was not enabled during that run. So I took the postproc commands that required the RUN option out and made a new graph that didn't require the RUN option but that did the pptransfer step. Maybe skipping the RUN step here is making it mess up?
The error that I get in the pptransfer job.err file is:
[ERROR] Archive directory /nerc/n02/n02/pmcguire/archive/u-bq532m/19880901T0000Z doesn't exist
The directory /nerc/n02/n02/pmcguire/archive does exist on Archer, but /nerc/n02/n02/pmcguire/archive/u-bq532m does not.
Any suggestions for what I could be doing wrong?
Patrick
Change History (15)
comment:1 Changed 13 months ago by pmcguire
comment:2 Changed 13 months ago by ros
Hi Patrick,
If you've hacked around with the graph this will more likely than not mess things up. For future reference cylc has the ability to be able to insert new tasks into already running cycles.
We can't see files in your /nerc/n02/n02/pmguire directory please change the permissions using chmod -R g+rX <dir>
I also can't see what the postproc task actually did as you have quite a few tar'rd up log directories and I don't know which to look in to get the log file, I suspect there was very little, that was available for archiving after only 1 month.
I strongly recommend you don't set the archive_root-path to be a subdirectory of the cylc-run/<suite-id> directory as this is where all the cylc suite control files go and if you were to do a rose suite-run --new at anytime the data archived under here would be deleted.
Regards,
Ros
comment:3 Changed 13 months ago by ros
Just found the postproc log output on ARCHER. There was nothing to archive at the end of the first month. See /home/n02/n02/pmcguire/cylc-run/u-bq532m/log/job/19880901T0000Z/postproc/01/job.out
comment:4 Changed 13 months ago by pmcguire
I now have website access.
I have changed the permissions of /nerc/n02/n02/pmcguire
Yes, as the postproc log says, no files were marked for archive after one month. Should I change something in my settings in order to get it to archive with only 1 month of data?
The most recent postproc run I did with this suite was with archive_root_path set to the nerc cylc-run directory. I have since changed it back to the /nerc/n02/n02/pmcguire/archive directory.
Patrick
comment:5 Changed 13 months ago by ros
If your run length is only 1 month then you should just need to change ARCHIVE_FINAL to True in the postproc app.
Cheers,
Ros.
comment:6 Changed 13 months ago by pmcguire
That's a good tip. Thanks!
Patrick
comment:7 Changed 13 months ago by pmcguire
pp files are now being created in the /nerc/n02/n02/pmcguire/archivedirectory!
Patrick
comment:8 Changed 13 months ago by pmcguire
The postproc task now successfully archives the pp files.
I am now trying to figure out what's wrong with my pptransfer task. Maybe you are much quicker than me in figuring this out? I will keep trying though.
Here's my error message:
[WARN] [SUBPROCESS]: Command: rsync -av --stats --rsync-path=mkdir -p /group_workspaces/jasmin2/nexcs/pmcguire/archer_archive/u-bq532m/19880901T0000Z && rsync /nerc/n02/n02/pmcguire/archive/u-bq532m/19880901T0000Z/ jasmin-xfer2.ceda.ac.uk:/group_workspaces/jasmin2/nexcs/pmcguire/archer_archive/u-bq532m/19880901T0000Z [SUBPROCESS]: Error = 255: Access to this system is monitored and restricted to authorised users. If you do not have authorisation to use this system, you should not proceed beyond this point and should disconnect immediately. Unauthorised use could lead to prosecution. (See also - http://www.stfc.ac.uk/aup) ssh_askpass: exec(/usr/libexec/openssh/ssh-askpass): No such file or directory Permission denied (publickey,gssapi-keyex,gssapi-with-mic). rsync: connection unexpectedly closed (0 bytes received so far) [sender] rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.6]
When I try to run that long rsync command at the command line of dtn02, it seems to start working just fine.
Patrick
comment:9 Changed 13 months ago by ros
I presume you have ssh-agent setup on dtn02?
It may be that when you try it interactively that you are using an agent that you have forwarded.
Cheers,
Ros.
P.s.
Still can't see into your /nerc/n02/n02/pmcguire directory.
comment:10 Changed 13 months ago by pmcguire
Thanks, Ros.
I just re-did the ssh-agent setup, but that didn't help. I still get the same error in pptransfer.
And I currently log in to Puma from my Mac with ssh -Y. It asks for both a passphrase and a password then, despite my attempts to start an ssh-agent on my Mac that can go to Puma.
Once I am on Puma, when I ssh to dtn02 without a passphrase/word, I don't use either -X or -Y, and from dtn02, I can ssh to jasmin-xfer2/xfer1 without a passphrase/word.
I have changed the permissions again on /nerc/n02/n02/pmcguire.
Patrick
comment:11 Changed 13 months ago by ros
Can you send the output from running ssh-add -l on dtn02 please?
Ros.
comment:12 Changed 13 months ago by ros
Ahh I can see what part of the problem is… the permissions on your .ssh on dtn02 directory are too open which it certainly won't like.
chmod 700 ~/.ssh
and try again.
comment:13 Changed 13 months ago by pmcguire
Thanks, Ros, for your sleuthing and for catching that! I really appreciate it.
I just changed the .ssh permissions, and I restarted the agent, and now ssh-add -l gives something appropriate:
2048 99:ce:08:26:33:20:3e:da:ec:17:c8:96:38:ff:fc:bd /nerc/n02/n02/pmcguire/.ssh/id_rsa_jasmin (RSA)
unlike before.
I have restarted the suite, and maybe we'll find out in a few minutes if it's working or not. I expect that it will be.
I have been trying to figure out in my spare time today if in a normal run that archive_final should be set to true or not.
Should it be false in a normal run? Is there documentation about archive_final somewhere?
Patrick
comment:14 Changed 13 months ago by pmcguire
Hi Ros
It still didn't work.
I subsequently figured out that after I started an agent and after I logged out of dtn02, when I logged back in to dtn02, the agent no longer was working properly. I think I tracked this down to my having both a .profile file and a .bash_profile file on dtn02. I am not sure if the .profile file is getting executed, but the .bash_profile file is getting executed. So I moved the ~/.ssh/ssh-setup line from the .profile file to the .bash_profile file, and renamed the .profile file to .profile_old, so that .profile is not executed.
It looks like it started the mkdir & rsync command properly on jasmin-xfer2 this time. The directories have been created there, and files are getting transferred.
Thanks for your help! I still am curious about the archive_final query in my previous comment.
Patrick
comment:15 Changed 8 months ago by ros
- Resolution set to fixed
- Status changed from new to closed
It looks like there are now partial copies of my /work/n02/n02/pmcguire/cylc-run/u-bq532m in my /nerc/n02/n02/pmcguire/cylc-run/u-bq532m directory. Should I just change the archive_root_path variable to be /nerc/n02/n02/pmcguire/cylc-run from /nerc/n02/n02/pmcguire/archive?
I am trying that now.
Patrick