#2304 closed help (fixed)

Not able to log onto MONSooN once a job is submitted

Reported by: s.varma13 Owned by: ros
Component: UM Model Keywords: Login
Cc: Platform: Monsoon2
UM Version: 8.4

Description

Hi, I am trying to do a run after a number of months and following the move to xcs. The job ID is xmbxk and this run previously worked on xcm.

I re set up my ssh key from PUMA to lander.monsoon and I can now use my PUMA passphrase to logon to lander.monsoon-metoffice.co.uk. When I submit the job to Monsoon it asks for my passphrase for the key but once I type it in, it says that it is incorrect and the submission fails.

Could you please let me know what you think the solution is.

Many thanks

Sunil

Attachments (1)

Screen Shot 2017-10-30 at 10.21.11.jpg (141.0 KB) - added by s.varma13 22 months ago.

Download all attachments as: .zip

Change History (17)

comment:1 Changed 22 months ago by ros

  • Owner changed from um_support to ros
  • Status changed from new to accepted

Hi Sunil,

I'm a bit confused by your statement that "I can now use my PUMA passphrase to logon to lander.monsoon-metoffice.co.uk"; you need your RSA key fob to logon to lander. Can you please double check that the UMUI is actually asking for your PUMA passphrase and not your passcode (pin + RSA key number), the wording is similar?

Cheers,
Ros.

comment:2 Changed 22 months ago by ros

Hi Sunil,

Just looked at your job and you're still pointing to the XCM you need to change the hostname to xcslc0 and also need to change the decomposition to be a multiple of 36, as the new machine has 36 cores per node.

Cheers,
Ros.

comment:3 Changed 22 months ago by s.varma13

Hi Ros,

I was just on the UMUI and it asked me to enter the passphrase for key '/home/s.varma13/ssh/id_rsa. I entered the MONSooN pin + rsa number twice and it went back to the passphrase request but on the third attempt it worked. However I have just seen your last comment. Should I stop the job, make those changes and then resubmit?

Where is the decomposition section?

I have one other question. Previously when I set up the ssh between PUMA and lander.monsoon when I ssh into lander.monsoon, it always asked for my MONSooN pin + rsa key. Now after typing ssh-copy-id -i ~/.ssh/id_rsa.pub suvar@… it asks me for my PUMA passphrase to log onto lander.monsoon rather than the pin + rsa. Is that correct?

Thank you.

Sunil

comment:4 Changed 22 months ago by ros

Hi Sunil,

The decomposition is on the "Job submission" page a little bit further down from where you set the target host.

Regarding ssh keys. You should not be copying your ssh key to lander.monsoon. You should always login to lander.monsoon with your pin + rsa. Instructions on the setup of ssh agent for Monsoon is on our website here:

http://cms.ncas.ac.uk/wiki/MonsoonSshAgent

Hope that helps.
Cheers,
Ros.

Changed 22 months ago by s.varma13

comment:5 Changed 22 months ago by s.varma13

Hi Ros,

Sorry for being a pain but I still cannot find decomposition except in relation to CICE. I attach a screen shot.

I have set up by mistake ssh between PUMA and lander.monsoon so that it only asks for my PUMA passphrase to then ssh into lander. Could you let me know the best way to reverse this? I looked at the attached MONSooN link but when I typed ssh-add -1, it did not recognise this. If I can still log onto lander.monsoon via this alternate route, is this going to be a problem using the UMUI or ROSE? I am a bit worried now to change anything.

Thank you.

Sunil

comment:6 Changed 22 months ago by ros

Hi Sunil,

It's in that window you have attached the screenshot of: "Number of processes for ATMOS East-West" and "North-South".

You will need to remove the key entry that you just added to the ~/.ssh/authorized_keys file on lander.monsoon to stop it asking for your passphrase. You shouldn't be able to set up direct ssh access using keys to lander and thus it is not guaranteed to work.

It's ssh-add -l (letter l not number one)

Cheers,
Ros.

comment:7 Changed 22 months ago by s.varma13

Hi Ros,

Thank you very much. I will remove the ssh link. It is really strange that it works!

What would you suggest for number of processes for east-west and north-south? 36 for both?

Many thanks

Sunil

comment:8 Changed 22 months ago by ros

Hi Sunil,

Try 12 x 18

Cheers,
Ros.

comment:9 Changed 22 months ago by s.varma13

Hi Ros

I submitted the job successfully - just hope it runs. It is the job I submitted with the wrong start dump date. I am now running it with the correct date.

I will let you know how different the results are.

Thanks a lot

Sunil

comment:10 Changed 22 months ago by s.varma13

Hi Ros, the job failed because the Ancil filenames version /home/mdalvi/umui_runs/ancil_vers/filenames_UM8.2_invert_rivers not found:

/home/d04/suvar/output/xmbxk000.xmbxk.d17303.t110327.rcf.leave

The UMRECON build was ok:
/home/d04/suvar/output/xmbxk000.xmbxk.d17303.t110327.comp.leave

Do you know if there is an updated file or file path?

Cheers

Sunil

comment:11 Changed 22 months ago by s.varma13

And if it is the file path, do I have to change all paths?

comment:12 Changed 22 months ago by ros

Hi Sunil,

User home directory paths have all changed (including some usernames) Mohit's home directory is now:

/home/d05/hadzm/umui_runs/ancil_vers

You will most likely find other paths that will need changing accordingly.

Standard path equivalents are listed here:
https://collab.metoffice.gov.uk/twiki/bin/view/Support/Monsoon2MigrationUM

Cheers,
Ros.

comment:13 Changed 22 months ago by s.varma13

Thanks Ros

Amended just those files, submitted and it is now producing the months of files required under the compile and initial run.

comment:14 Changed 22 months ago by s.varma13

Hi Ros

I forgot to create a moose file for my run before I ran the model under CRUN. It had produced two daily files. I have stoppped the CRUN and created the archived file 17. moo mkset –v moose:/crum/xmbxk -p=project-ukca. Should I start from the beginning again or can I just start the CRUN again?

Many thanks

Sunil

comment:15 Changed 22 months ago by ros

Hi Sunil,

The archiving failed because it couldn't find the archiving scripts, there are now under: /common/moci/archiving/bin

You will need to change the path in Post-processing → main switch

If you want the system to archive the files for you automatically then you will need to redo the run from the beginning. The archiving scripts should do the mkset for you, but no harm in doing it yourself first.

Cheers,
Ros.

comment:16 Changed 21 months ago by ros

  • Resolution set to fixed
  • Status changed from accepted to closed
Note: See TracTickets for help on using tickets.